Keep at least 1 second of frame timings.
This is necessary for 2 reasons - a real one and a fun one.
First, with the difference in monitor refresh rates, we can have 48Hz
latops as well as 240Hz high refresh rate monitors. That's a factor of
4, and tracking frame rates in both situations reliably is kind of hard
- either we track over too many frames and the fps take a lot of time to
adjust, or we track too little time and the fps fluctuate wildly.
Second, when benchmarking with GDK_DEBUG=no-vsync with a somewhat fast
renderer (*cough*Vulkan*cough*) frame rates can go into insane dimensions
and only very few frames are actually getting presentation times
reported. So to report accurate frame rates in those cases, we need a
*very* large history that can be 1000s of times larger than the usual
history. And that's just a waste for normal usage.
Previously, our reported fps numbers could be too low when the start
timings weren't complete. In that case we would use the frame time, but
the frame time is the time when the frame was rendered, which is quite a
few milliseconds before it is presented.
So in that case we would not report the difference in presentation
times, but the difference from start of rendering. However, those times
are way more variable and can smear over the whole frame because they
depend on when we received the frame callbacks to high priority GSources
as well as our own render time predictions.
This happened in particular with GDK_DEBUG=no-vsync and could report
number that are off by a factor of 2.
Now we skip any incomplete frames, because those frames never have
presentation times reported. This makes it theoretically more likely to
not being able to report fps at all, but I'd rather have no fps than fps
off by a factor of 2.
It started out as busywork, but it does many separate things. If I could
start over, I'd take them apart into multiple commits:
1. Remove G_ENABLE_DEBUG around GDK_DEBUG_*() calls
This is not needed at all, the calls themselves take care of it.
2. Remove G_ENABLE_DEBUG around profiling code
This now enables profiling support in release builds.
3. Stop poking _gdk_debug_flags and use GDK_DEBUG_CHECK()
This was old code that was never updated.
4. Make !G_ENABLE_DEBUG turn off GDK_DEBUG_CHECK()
The code used to
#define GDK_DEBUG_CHECK(...) false
#define GDK_DEBUG(...)
which would compile away all the code inside those macros. This
means a lot of variable definitions and debug utility functions
would suddenly no longer be used and cause compiler errors.
Not all frames get timing info with GDK_DEBUG=no-vsync, so make sure
that even when we render tons of frames, the one frame that does get
timing info is still there when the timing info arrives.
I set it to 128 from 16 now.
This is roughly good enough to go to 5000fps from on a 60Hz monitor.
In commit c6901a8b, the frame clock reported time was changed from
simply reporting the time we ran the frame clock cycle to reporting a
smoothed value that increased by the frame interval each time it was
called.
However, this change caused some problems, such as:
https://gitlab.gnome.org/GNOME/gtk/-/merge_requests/1415https://gitlab.gnome.org/GNOME/gtk/-/merge_requests/1416https://gitlab.gnome.org/GNOME/gtk/-/merge_requests/1482
I think a lot of this is caused by the fact that we just overwrote the
old frame time with the smoothed, monotonous timestamp, breaking
some things that relied on knowing the actual time something happened.
This is a new approach to doing the smoothing that is more explicit.
The "frame_time" we store is the actual time we ran the update cycle,
and then we separately compute and store the derived smoothed time and
its period, allowing us to easily return a smoothed time at any time
by rounding the time difference to an integer number of frames.
The initial frame_time can be somewhat arbitrary, as it depends on the
first cycle which is not driven by the frame clock. But follow-up
cycles are typically tied to the the compositor sending the drawn
signal. It may happen that the initial frame is exactly in the middle
between two frames where jitter causes us to randomly round in
different directions when rounding to nearest frame. To fix this we
additionally do a quadratic convergence towards the "real" time,
during presentation driven clock cycles (i.e. when the frame times are
small).
On my X11 + nvidia setup gnome-shell doesn't report presentation times.
However it does report refresh rate. We were mostly using this in our
calculation except when computing predicted presentation time, were
it fell back on the default 60Hz.
The marks are averaged based on the name, so this makes more sense.
Also rename the map/unmap marks to have the same capitalization as
everything else.
This drops the marks for before/after-paint as they are internal
things that very rarely use any time, and also flush/resume-events
as any events reported here will get separate marks so will be easy
to see anyway.
Also, we rename the entire frameclock cycle to "frameclock cycle"
rather than "paint_idle" which is rather cryptic.
These don't take a duration, instead they call g_get_monotonic_time() to
and subtract the start time for it.
Almost all our calls are like this, and this makes the callsites clearer
and avoids inlining the clock call into the call site.
When we use if (GDK_PROFILER_IS_RUNNING) this means we get an
inlined if (FALSE) when the compiler support is not compiled in, which
gets rid of all the related code completely.
We also expand to G_UNLIKELY(gdk_profiler_is_running ()) in the supported
case which might cause somewhat better code generation.
usec is the scale of the monotonic timer which is where we get almost
all the times from. The only actual source of nsec is the opengl
GPU time (but who knows what the actual resulution of that is).
Changing this to usec allows us to get rid of " * 1000" in a *lot* of
places all over the codebase, which are ugly and confusing.
Instead of reporting the frame clock phases as defined,
report the duration of the signal emissions, which is more
useful for tracking down what is taking time.
To make a frame clock tick as long as any of the associated surfaces
expect to receive ticks, make the surfaces inhibit freezing the clock,
instead of directly tell the frame clock to freeze itself.
This makes it so that as long as any surface using a certain frame clock
is not frozen (e.g. just received a frame event from the display
server), the frame clock will not be frozen.
With this, the frame clock is initiated as frozen, and won't be thawed
until any surface inhibits freeze. It will be frozen again, when every
surface has that previously inhibited freeze uninhibited freeze.
If we set c_marshaller manually, then g_signal_newv() will not setup a
va_marshaller for us. However, if we provide c_marshaller as NULL, it will
setup both the c_marshaller (to g_cclosure_marshal_VOID__VOID) and
va_marshaller (to g_cclosure_marshal_VOID__VOIDv) for us.
We were adding incomplete frame timings to the
profile, which lead to occasional nonsense
numbers. Instead, only add timings to the profile
once we marked them as complete. This also
gives us an opportunity to add the presentation
time as a marker.
Remove all the old 2.x and 3.x version annotations.
GTK+ 4 is a new start, and from the perspective of a
GTK+ 4 developer all these APIs have been around since
the beginning.
Typically, there won't be any references on old frame timings except for
the most recent timing. So instead of discarding these and re-entering
gslice twice, just steal the old frame timing and reuse it.
https://bugzilla.gnome.org/show_bug.cgi?id=765592
These were showing up higher in Sysprof profiles.
The simple fix is to avoid the emit_by_name() and let the interface emit
the signals directly. No function preconditions are provided since these
are internal API.
The g_print documentation explicitly says not to do this, since
g_print is meant to be redirected by applications. Instead use
g_message for logging that can be triggered via GTK_DEBUG.