If the size changes, we need to relayout the tiles. Otherwise we can keep
using what we had before. Generally, that shouldn't happen, but the
previous check was failing in a number of ways.
It looks like, particularly on the M1, we might need to double buffer the
contents of the IOSurface<->OpenGL texture bindings. This doesn't appear
to show up on the Intel macbooks I've tried, but I've seen it in the wild
on an M1.
If we have a 2x scale laptop with a 1x scale external display, we would
need to create a new IOSurface for the external display once it crosses
a boundary, otherwise we won't have something capable of displaying
correctly on the second monitor.
This provides a major shift in how we draw both when accelerated OpenGL
as well as software rendering with Cairo. In short, it uses tiles of Core
Animation's CALayer to display contents from an OpenGL or Cairo rendering
so that the window can provide partial damage updates. Partial damage is
not generally available when using OpenGL as the whole buffer is flipped
even if you only submitted a small change using a scissor rect.
Thankfully, this speeds up Cairo rendering a bit too by using IOSurface to
upload contents to the display server. We use the tiling system we do for
OpenGL which reduces overall complexity and differences between them.
A New Buffer
============
GdkMacosBuffer is a wrapper around an IOSurfaceRef. The term buffer was
used because 1) surface is already used and 2) it loosely maps to a
front/back buffer semantic.
However, it appears that IOSurfaceRef contents are being retained in
some fashion (likely in the compositor result) so we can update the same
IOSurfaceRef without flipping as long as we're fast. This appears to be
what Chromium does as well, but Firefox uses two IOSurfaceRef and flips
between them. We would like to avoid two surfaces because it doubles the
GPU VRAM requirements of the application.
Changes to Windows
==================
Previously, the NSWindow would dynamically change between different
types of NSView based on the renderer being used. This is no longer
necessary as we just have a single NSView type, GdkMacosView, which
inherits from GdkMacosBaseView just to keep the tedius stuff separate
from the machinery of GdkMacosView. We can merge those someday if we
are okay with that.
Changes to Views
================
GdkMacosCairoView, GdkMacosCairoSubView, GdkMacosGLView have all been
removed and replaced with GdkMacosView. This new view has a single
CALayer (GdkMacosLayer) attached to it which itself has sublayers.
The contents of the CALayer is populated with an IOSurfaceRef which
we allocated with the GdkMacosSurface. The surface is replaced when
the NSWindow resizes.
Changes to Layers
=================
We now have a dedicated GdkMacosLayer which contains sublayers of
GdkMacosTile. The tile has a maximum size of 128x128 pixels in device
units.
The GdkMacosTile is partitioned by splitting both the transparent
region (window bounds minus opaque area) and then by splitting the
opaque area.
A tile has either translucent contents (and therefore is not opaque) or
has opaque contents (and therefore is opaque). An opaque tile never
contains transparent contents. As such, the opaque tiles contain a black
background so that Core Animation will consider the tile's bounds as
opaque. This can be verified with "Quartz Debug -> Show opaque regions".
Changes to Cairo
================
GTK 4 cannot currently use cairo-quartz because of how CSS borders are
rendered. It simply causes errors in the cairo_quartz_surface_t backend.
Since we are restricted to using cairo_image_surface_t (which happens to
be faster anyway) we can use the IOSurfaceBaseAddress() to obtain a
mapping of the IOSurfaceRef in user-space. It always uses BGRA 32-bit
with alpha channel even if we will discard the alpha channel as that is
necessary to hit the fast paths in other parts of the platform. Note
that while Cairo says CAIRO_FORMAT_ARGB32, it is really 32-bit BGRA on
little-endian as we expect.
OpenGL will render flipped (Quartz Native Co-ordinates) while Cairo
renders with 0,O in the top-left. We could use cairo_translate() and
cairo_scale() to reverse this, but it looks like some cairo things may
not look quite as right if we do so. To reduce the chances of one-off
bugs this continues to draw as Cairo would normally, but instead uses
an CGAffineTransform in the tiles and some CGRect translation when
swapping buffers to get the same effect.
Changes to OpenGL
=================
To simplify things, removal of all NSOpenGL* related components have
been removed and we strictly use the Core GL (CGL*) API. This probably
should have been done long ago anyay.
Most examples found in the browsers to use IOSurfaceRef with OpenGL are
using Legacy GL and there is still work underway to make this fit in
with the rest of how the GSK GL renderer works.
Since IOSurfaceRef bound to a texture/framebuffer will not have a
default framebuffer ID of 0, we needed to add a default framebuffer id
to the GdkGLContext. GskGLRenderer can use this to setup the command
queue in such a way that our IOSurface destination has been
glBindFramebuffer() as if it were the default drawable.
This stuff is pretty slight-of-hand, so where things are and what needs
flushing when and where has been a bit of an experiment to see what
actually works to get synchronization across subsystems.
Efficient Damages
=================
After we draw with Cairo, we unlock the IOSurfaceRef and the contents
are uploaded to the GPU. To make the contents visible to the app,
we must clear the tiles contents with `layer.contents=nil;` and then
re-apply the IOSurfaceRef. Since the buffer has likely not changed, we
only do this if the tile overlaps the damage region.
This gives the effect of having more tightly controlled damage regions
even though updating the layer would damage be the whole window (as it
is with OpenGL/Metal today with the exception of scissor-rect).
This too can be verified usign "Quartz Debug -> Flash screen udpates".
Frame Synchronized Resize
=========================
In GTK 4, we have the ability to perform sizing changes from compute-size
during the layout phase. Since the macOS backend already tracks window
resizes manually, we can avoid doing the setFrame: immediately and instead
do it within the frame clock's layout phase.
Doing so gives us vastly better resize experience as we're more likely to
get the size-change and updated-contents in the same frame on screen. It
makes things feel "connected" in a way they weren't before.
Some additional effort to tweak gravity during the process is also
necessary but we were already doing that in the GTK 4 backend.
Backporting
===========
The design here has made an attempt to make it possible to backport by
keeping GdkMacosBuffer, GdkMacosLayer, and GdkMacosTile fairly
independent. There may be an opportunity to integrate this into GTK 3's
quartz backend with a fair bit of work. Doing so could improve the
situation for applications which are damage-rich such as The GIMP.
There are situations where our "default framebuffer" is not actually
zero, yet we still want to apply a scissor rect.
Generally, 0 is the default framebuffer. But on platforms where we need
to bind a platform-specific feature to a GL_FRAMEBUFFER, we might have a
default that is not 0. For example, on macOS we bind an IOSurfaceRef to
a GL_TEXTURE_RECTANGLE which then is assigned as the backing store for a
framebuffer. This is different than using gsk_gl_renderer_render_texture()
in that we don't want to incur an extra copy to the destination surface
nor do we even have a way to pass a texture_id into render_texture().
The GtkFileCHooserNativeQuartz injects a NSComboBox into the NSSavePanel
(which is displayed in a remote process since 10.15 whether or not you are
a sandboxed application). The style has changed and we need more space
here to not clip part of the combobox out of view.
I tried every size from 22 to 30 and this seemed to look the most natural
without skewing the location of the text within the combobox.
Previously, the popover would cause the window to go into the :backdrop
state which is not what we want for consistency with other platforms. This
fixes that by walking up the surface chain when we get notified of
loosing or acquiring "key" input from the display server.
If the rendering operation is over an opaque region, we can potentially
avoid clearing a large section of the framebuffer destination. Some cases
you do want to clear, such as when clearing the whole contents as some
drivers have fast paths for that to avoid bringing data back into the
framebuffer.
One may be using IJG libjpeg or libjpeg-turbo to build GTK, and their
build files may or may not generate pkg-config files for us. To make
things easier, we can make use of CMake's built-in support for finding
IJG libjpeg or libjpeg-turbo.
The CMake build files for libtiff may or may not generate pkg-config
files for us, so we can use Meson's CMake support to help us find
libtiff, as CMake has built-in support for finding libtiff.
Add a variable in meson.build that covers Visual Studio-like compilers,
so that we can use it to help us find depedencies using CMake rather
than via pkg-config, where applicable.
Change the existing use case for finding libpng accordingly.
We might have panels with controls in them where the window is running in
another process. The control could have a wrapper window which we would
see from this process. This can happen with the GtkFileChooserNative, but
any NSSavePanel in macOS 10.15+ is out of process (not just sandboxed
applications).
This significantly cleans up how we handle various move-resize, compute-
size, and configure (notification of changes) in the macOS GDK backend.
Originally when prototyping this backend, there were some bits that came
over from the quartz backend and some bits which did not. It got confusing
and so this makes an attempt to knock down all that technical debt.
It is much simpler now in that the GdkMacosSurface makes requests of the
GdkMacosWindow, and the GdkMacosWindow notifies the GdkMacosSurface of
changes that happen.
User resizes are delayed until the next compute-size so that we are much
closer to the layout phase, reducing chances for in-between frames.
This also improves the situation of leaving maximized state so that a
grab and drag feels like you'd expect on other platforms.
I removed the opacity hack we had in before, because that is all coming
out anyway and it's a bit obnoxious to maintain through the async flows
here.
This fixes GTK's NSWindow for toplevels so that they are allowed to enter
fullscreen. We were already handlign the state transitions from the
setStyleMask: halper, but we didn't previously tell the window we are
allowed to transition into that.
There is a bit of a mismatch here in that GTK doesn't have any such flag
that determines if a window is "allowed" by policy to enter fullscreen
since window managers on Linux are free to do that at will.
This makes it easier to figure out those values (which are mentioned in
the GtkApplication documentation) rather than working that out from the
way they're generated (or documented as being generated).
If we have GStreamer on macOS we likely have support for CGL to get an
OpenGL context we can use. This provides the missing pieces to get
accelerated video playback in gtk4-widget-factory working.
This more than halves the total runtime of this function since the
previous commit, from 8.36% to 4.02%, and is most likely memory
bandwidth limited on this specific board now.
I tried to do a SSE2 version as well, but couldn’t find any equivalent
of the LD4/ST4 ARM instruction.
On x86 on a Kaby Lake CPU, this makes it go from 6.63% of the total
execution time (loading some PNGs using the cairo backend) down to
3.20%.
On ARM on a Cortex-A7, on the same workload, this makes it go from 57%
to 8.36%.