But don't call it too early, we only want to call it once we have
prepared the target.
This way, we guarantee that a GL context is always available and that it
is bound to the correct target.
Don't pass texture + rect, but instead have
gdk_memory_texture_new_subtexture()
and use it to generate subtextures and pass them.
This has the advantage of downloading the a too large texture only once
instead of N times.
It does not belong in GdkGLContext, it's a renderer thing.
It's also the only user of that API.
Introduce gdk_gl_context_check_version() private API to make version
checks simpler.
It turns out glReadPixels() cannot convert pixels and you are only
allowed to pass a single value into the function arguments. You need to
know which ones or things will explode.
GL is great.
Pass a format do GdkTextureClass::download(). That way we can download
data in any format.
Also replace gdk_texture_download_texture() with
gdk_memory_texture_from_texture() which again takes a format.
The old functionality is still there for code that wants it: Just pass
gdk_texture_get_format (texture) as the format argument.
Move the resources of each renderer to its subdirectory.
We've previously done that for the ngl renderer, but it
is better to be consistent and do it for all the renderers.
When we are rendering a texture node to an offscreen,
and we have a clip, we must force the offscreen rendering.
Otherwise, the code will notice: Hey, it already is a texture
node, so no need to render it to a texture again. But when
clipping is involved, that is exactly what we want to do.
Testcase included.
Fixes: #3651
According to OpenGL spec, a shader object will only be flagged
for deletion unless it has been detached; when a program object
is deleted, those shader objects attached to it will be detached
but not deleted unless they have already been flagged for deletion.
So we shall detach a shader object before it is deleted, and delete
it before the program object is deleted best.
This way we can render the first frame of tests/testoutsetshadowdrawing
in 153 ops instead of 183.
And the first frame of gtk4-demo in 260 instead of 300.
These positions are not guaranteed to be in a specific order when linked
into the final GPU program. They need to be specified so that our code
in gskglrenderer.c can use known positions for them to match up with
our GskQuadVertex.
This fixes the GL renderer on macOS's OpenGL shader compiler.
Fixes#3420
Catch the error when it happens, so that we can emit a specific and more
helpful error message.
Also verify that all branches in the code now do indeed set a proper
GError when they fail, so that the final catch-all is no longer needed.
Instead, assert that the error is set so that we catch future code
additions early that do not set the GError.
glFrameBufferTexture maps to all faces of a cube and that is not needed
here. Additionally, texture_id is not deleted after we use the additional
flipped texture, but should be.
On desktop GL, GL 1.5 or GL_ARB_occlusion_query is required to get the
glGenQueries() etc. symbols. This isn’t the case on GLES, where they
are provided by GL_EXT_occlusion_query_boolean, and more importantly
have never been made core.
This patch allows gtk4-demo to start when GDK_DEBUG=gl-gles is set, on
my Mali 400 MP running the Lima driver from Mesa.
We're caching two things, either a node itself being rendered, or a
parent storing a cached version of a child as rendered to an offscreen
the size and location of the parent.
If both the parent and child uses the cache this will cause a conflict in
the cache as it is currently use keying of a node pointer which will have
the same value for the node-as-itself and the child-node-of-the-parent.
We fix this by adding another part to the key "pointer_is_child" which means
we can have the same node pointer twice in the cache.
Additionally, in the child-is-rendered-offscreen case the offscreen
result actually depends on the position and size of the parent viewport,
so we need to store the parent bounds in that case.
This allows us to avoid updating uniforms if that is not necessary. This
in turn allows us to sometimes reuse the same draw op by just extending the
vertex array size we draw rather than doing a separate glDraw call.
For example, in the fishbowl demo, all the icons added at the same
time will have the same time and size, so we emit single draw calls
with 100s of triangles instead of 100s of draw calls with 2 triangles.
For vulkan/broadway this just means to ignore it, but for the gl
backend we support (with up to 4 texture inputs, which is similar to
what shadertoy does, so should be widely supported).
Print out the full assembled shader sources when
GSK_DEBUG=shaders is given. This is very verbose,
but may be useful to see what we actually pass
to the compiler.
Almost always the source is created by combining various sources, which
means the line numbers in the error messages are hard to use. Adding
the line numbers to the source in the error message helps with this.
There is no real reason to have this on the side indexed via the
index, as it is stored next to each other anyway. Plus, storing them
together lets use use `Program` structures not in the array.
I found that the gears demo was spending 40% cpu
downloading a GL texture every frame, only to
upload it again to another context.
While the GSK rendering and the GtkGLArea use different
GL contexts, they are (usually) connected by sharing data
with the same global context, so we can just use the
texture without the download/upload dance. This brings
gears down to < 10% cpu.
Do custom uploads rather than using gdk_cairo_surface_upload_to_gl(),
because this way we avoids a roundtrip (memcpy and possibly conversion)
to the cairo image surface format.
GLES doesn't support the GL_BGRA + GL_UNSIGNED_INT_24_8 hack that
we use on desktop OpenGL to upload textures directly in the cairo
pixel format. This adds the required conversions to all the places
that currently need it.
We also add a data_format to the internal gdk_gl_context_upload_texture()
function to make it clearer what the format are. Currently it is always
the cairo image surface format, but eventually we want to support other
formats so that we can avoid some of the unnecessary conversions we do.
Also, the current gdk_gl_context_upload_texture() code always converts
to a cairo format and uploads that like we did before. Later commits
will allow this to use other upload formats that gl supports to avoid
conversions.
We need to include both the scale and the filtering
in the key for the texture cache, since those affect
the texture.
This fixes misrendering in the recorder in the inspector
whenever transforms are involved. An example where this
was showing up is testrevealer's swing transition.
When rendering to an offscreen because of transforms,
check if transforming the bounds of the node results
in a non-axis-aligned quad. If it doesn't, we want
GL_NEAREST interpolation to get sharp edges. Otherwise,
we use GL_LINEAR to get better results for things
that are actually transformed.
Track what we really need to send for inset shadows, which are used
as a border replacement in many cases.
Fishbowl says I can draw around 200-300 more switches per frame like
this too.
This fixes the widget factory rendering too much.
In the widget-factory, we generally have a pretty small update area (two
spinners and a progressbar). We take the extents of that as a update
area and inital clip.
However, the first clip node we see is from the toplevel window, which
essentially increases the clip again to almost the entire window.
Fix that by ignoring such cases.
If the inner clip intersects with the corners of the outer clip, we
potentially need a texture. We should add more fine-grained checks for
this in the future though.
Test case included.
Language bindings—especially ones based on introspection—cannot deal
with custom type hiearchies. Luckily for us, GType has a derivable type
with low overhead: GTypeInstance.
By turning GskRenderNode into a GTypeInstance, and creating derived
types for each class of node, we can provide an introspectable API to
our non-C API consumers, with no functional change to the C API itself.
Sprinkle various g_assert() around the code where gcc cannot figure out
on its own that a variable is not NULL and too much refactoring would be
needed to make it do that.
Also fix usage of g_assert_nonnull(x) to use g_assert(x) because the
first is not marked as G_GNUC_NORETURN because of course GTester
supports not aborting on aborts.
Some systems (notably macOS) will not allow enumeration of an extension that has been promoted to core OpenGL for context in use. This change assumes that GL_ARB_timer_query is available on OpenGL 3.3+.
I could not find definitive information on whether GL_ARB_debug_output or GL_KHR_debug have been added to core. Other extensions in use were addressed by https://gitlab.gnome.org/GNOME/gtk/merge_requests/1422 .
Commit 47c44644b1 was a bit overzealous in fixing
compiler warnings. We still need to call collect_textures,
even if we don't need the number that it returns.
These don't take a duration, instead they call g_get_monotonic_time() to
and subtract the start time for it.
Almost all our calls are like this, and this makes the callsites clearer
and avoids inlining the clock call into the call site.
When we use if (GDK_PROFILER_IS_RUNNING) this means we get an
inlined if (FALSE) when the compiler support is not compiled in, which
gets rid of all the related code completely.
We also expand to G_UNLIKELY(gdk_profiler_is_running ()) in the supported
case which might cause somewhat better code generation.
usec is the scale of the monotonic timer which is where we get almost
all the times from. The only actual source of nsec is the opengl
GPU time (but who knows what the actual resulution of that is).
Changing this to usec allows us to get rid of " * 1000" in a *lot* of
places all over the codebase, which are ugly and confusing.
This is similar to how we share texture atlases. Some added complexity
in that the program state also needed to be shared, so it had to move to
the shared Programs object.
With this change realization of additional GskRenderers when opening
popups went from ~60msec to ~35 msec on average.
When rendering ops to an offscreen texture we take max-texture-size
in consideration and modify the scale we use such that the required
texture does not exceed the limit.
This means some rendering will be blocky/fuzzy, but that is better
than it being clipped.
It would probably be better to not do this and always render the outline
in plain white, then later recolor it but do this for no, just for
correctness.
Instead of loading the unflipped version first and then flipping it.
Don't do it in add_render_ops either but only in the function actually
adding the render ops for the nodes, since those frequently have
early-out conditions that don't need the vertex data at all.
When attaching renderer-specific data, we need to
make sure that we key it off the renderer that is
in use, and cope with the absence of render data.
This fixes recording nodes in the inspector.
Return a pointer to the IconData struct. This is
closer to the glyph cache api, and will allow us
to add similar shortcuts. For now, just store
texture coords in the form we need, avoiding
converting them over and over.
This is a quick implementation that avoids many
glyph cache lookups. We keep an array of direct
pointers in the text render node, and throw those
cached pointers away whenever any atlases have
been dropped (since that may invalidate the cached
glyphs).
In many cases of the switch, we do not need the vertex data. This moves
the creation of the vertex_data array into a secondary function and only
calculates it the cases for which it is required.
We were putting big glyphs in the cache, in their
own texture, but forgetting to mark the texture
as permanent, so it could be reused, leading to
occasional misrendering. Fix this by marking these
textures as permanent, and explicitly freeing them
when the cache entry gets old.
Every few frames, we do extra work for the
cache aging. Arrange for the glyph and icon
caches to not cause extra work on the same
frame, to smooth things out.
There is no need for us to be very precise about
aging the glyph entries. It is enough to check
occasionally and mark old entries. This reduces
the overhead of work we do every frame on the
caches, at the cost of letting glyphs linger
a bit longer in the cache.
Make this function more similar to the icon
cache equivalent, and simplify it a bit. We
don't use the boolean return, and we don't need
to look at the age of entry when marking it
used.
Remember which atlases were removed, and only
check those when looking for icons or glyphs
to remove. For most frames, we don't have to
check at all since no atlases were removed.
Instead of copying the (rather large) RenderOp to the GArray, we can
simply set the fields directly in the allocated space for the struct.
In most cases, there wont be any allocations to make as the array size
is kept in tact across frame renderings.
We can just use memcmp here because even in the use of lookup keys with
C99 initializers, we can rely on any space between fields added by the
compiler to be zeroed. So we might as well use wider memory cmopares.
We can't just assume that the pointer we'se using as a cache key will
stay unique forever. The texture might be freed, and a later allocated
texture might have the same addres now, causing the cache to return
incorrect results.