qt5base-lts/tests/manual/rhi/hellominimalcrossgfxtriangle/window.cpp

225 lines
6.6 KiB
C++
Raw Normal View History

// Copyright (C) 2020 The Qt Company Ltd.
// SPDX-License-Identifier: LicenseRef-Qt-Commercial OR BSD-3-Clause
#include "window.h"
#include <QPlatformSurfaceEvent>
rhi: metal: Switch back to presentDrawable This convenience should be, according to the Apple docs, equivalent to calling present from a scheduled handler. (which on its own makes it unclear why we switched in the first place) In practice it seems the two approaches are not identical. It looks like that once a frame is submitted earlier than the next display link callback, the throttling behavior we implement in beginFrame() (waiting on the semaphore for the completion of the appropriate command list etc.) starts exhibiting unexpected behavior, not correctly throttling the thread to the refresh rate. Changing back to presentDrawable does not exhibit this at all. The suspicion is that presentDrawable is probably doing more than what the docs suggest, and so is not fully equivalent to calling present manually from a scheduled handler. Therefore, switch to presentDrawable now, which restores the expected cross-platform behavior, but make a note of the oddity, and also prepare the hellominimalcrossgfxtriangle manual test to provide an easy, self-contained application to allow experimenting in the future, if needed. This allows Qt Quick render thread animations to advance at the expected speed (because the render thread is correctly throttled to the refresh rate), even if the render thread decides to generate a new frame right away, without waiting for the next display link update. Without this patch, attempting to get updates not via requestUpdate(), but by other means (timer etc.) leads to incorrect throttling, and so the triangle in the test app is rotating faster than expected - but only with Metal. Running with OpenGL on macOS or with any API on any other platform the behavior will be correct. Even if scheduling updates without display link is not efficient, and should be discouraged, not doing so cannot break the core contract of vsync throttling, i.e. the thread cannot run faster just because it renders a frame not in response to an UpdateRequest. Amends 98b60450f7ce6b16464392747ab8721f30add15e (effectively reverts but keeps the code and the notes because we might want to clear this up some day) Pick-to: 6.4 6.3 6.2 Fixes: QTBUG-103415 Change-Id: Id3bd43e94785384142337564ce4b2644bf257100 Reviewed-by: Tor Arne Vestbø <tor.arne.vestbo@qt.io>
2022-06-21 08:51:04 +00:00
#include <QTimer>
Window::Window(QRhi::Implementation graphicsApi)
: m_graphicsApi(graphicsApi)
{
switch (graphicsApi) {
case QRhi::OpenGLES2:
setSurfaceType(OpenGLSurface);
break;
case QRhi::Vulkan:
setSurfaceType(VulkanSurface);
break;
case QRhi::D3D11:
rhi: Add D3D12 support - The optional nice-to-haves DebugMarkers, Timestamps, PipelineCache are not yet implemented (features reported as false, to be implemented later, although buffer/texture resource name setting already works as-is, regardless of DebugMarkers). - Mipmap generation for 3D textures is missing. Won't matter much given that 3D textures are not used in Qt for anything atm. For generating mipmaps for 2D (or 2D array) textures, the MiniEngine compute shader and approach is used. 3D support for the mipmap generator may be added later. 1D textures / arrays are supported except for mipmap generation, and so the OneDimensionalTextureMipmaps feature is reported as false. - Qt Quick and Qt Quick 3D are expected to be fully functional. (unforeseen issues are not impossible, of course) - Uses minimum feature level 11.0 when requesting the device. It is expected to be functional on resource binding tier 1 hardware even, although this has not been verified in practice. - 2 frames in flight with the usual resource buffering (QRhiBuffer::Dynamic is host visible (UPLOAD) and always mapped and slotted, other buffers and textures are device local (DEFAULT). Requests 3 swapchain buffers. Swapchains are mostly like with D3D11 (e.g. FLIP_DISCARD and SCALING_NONE). - The root signature generation is somewhat limited by the SPIR-V binding model and that we need to map every binding point using the nativeResourceBindingMap from the QShader. Thus the root signature is laid out so each stage has its own set of resources, with shader register clashes being prevented by setting the visibility to a given stage. Sampler handling is somewhat suboptimal but we are tied by the binding model and existing API design. It is in a fairly special situation due to the 2048 limit on a shader visible sampler heap, as opposed to 1000000 for SRVs and UAVS, so the approach we use for textures (just stage the CPU SRVs on the (per-frame slot) shader visible heap as they are encountered, effectively treating the heap as a ring buffer) would quickly lead to having to switch heaps many times with scenes with many draw calls and sampledTexture/sampler bindings in the srb. Whereas static samplers, which would be beautiful, are impossible to utilize safely since we do not have that concept (i.e. samplers specified upfront, tied to the graphics/compute pipeline) in the QRhi API, and an srb used at pipeline creation may change its associated resources, such as the QRhiSampler reference, by the time the shader resources are set for the draw call (or another, compatible srb may get used altogether), so specifying the samplers at root signature creation time is impossible. Rather, the current approach is to treat each sampler as a separate root parameter (per stage) having a descriptor table with a single entry. The shader visible sampler heap has exactly one instance of each unique sampler encountered during the lifetime of the QRhi. - Shader-wise no different from D3D11, works with HLSL/DXBC 5.0 (i.e. existing .qsb files with DXBC in them work as-is). But unlike D3D11, this one will try to pick 6.7, 6.6, ..., down to 5.0 from the QShader, in that order. - Uses D3D12MA for suballocating. As a result it can report vmem allocation statistics like the Vulkan backend, and it does more since the DXGI memory usage (incl. implicit resources) is also reported. This is optional technically, so we also have the option of going straight with the heavyweight CreateCommittedResource() instead. That is what we do if the adapter chosen reports it's software-based or when QT_D3D_NO_SUBALLOC=1 is set. - PreferSoftwareRenderer (picking the WARP device) and the env.var. QT_D3D_ADAPTER_INDEX work as with the D3D11 backend. - It is not unexpected that with large scenes that generate lots of draw calls with multiple textures/samplers per call the performance may be slightly below D3D11 (probably mostly due to descriptor management). Similarly, the reported memory usage will be higher, which is partly natural due to creating heaps, descriptor pools, staging areas, etc. upfront. Will need to be evaluated later how these can be tuned. Change-Id: I5a42580bb65f391ebceaf81adc6ae673cceacb74 Reviewed-by: Andy Nichols <andy.nichols@qt.io> Reviewed-by: Qt CI Bot <qt_ci_bot@qt-project.org>
2022-08-12 10:01:41 +00:00
case QRhi::D3D12:
setSurfaceType(Direct3DSurface);
break;
case QRhi::Metal:
setSurfaceType(MetalSurface);
break;
default:
break;
}
}
void Window::exposeEvent(QExposeEvent *)
{
// initialize and start rendering when the window becomes usable for graphics purposes
if (isExposed() && !m_running) {
qDebug("init");
m_running = true;
init();
resizeSwapChain();
}
const QSize surfaceSize = m_hasSwapChain ? m_sc->surfacePixelSize() : QSize();
// stop pushing frames when not exposed (or size is 0)
if ((!isExposed() || (m_hasSwapChain && surfaceSize.isEmpty())) && m_running && !m_notExposed) {
qDebug("not exposed");
m_notExposed = true;
}
// Continue when exposed again and the surface has a valid size. Note that
// surfaceSize can be (0, 0) even though size() reports a valid one, hence
// trusting surfacePixelSize() and not QWindow.
if (isExposed() && m_running && m_notExposed && !surfaceSize.isEmpty()) {
qDebug("exposed again");
m_notExposed = false;
m_newlyExposed = true;
}
// always render a frame on exposeEvent() (when exposed) in order to update
// immediately on window resize.
if (isExposed() && !surfaceSize.isEmpty())
render();
}
bool Window::event(QEvent *e)
{
switch (e->type()) {
case QEvent::UpdateRequest:
render();
break;
case QEvent::PlatformSurface:
// this is the proper time to tear down the swapchain (while the native window and surface are still around)
if (static_cast<QPlatformSurfaceEvent *>(e)->surfaceEventType() == QPlatformSurfaceEvent::SurfaceAboutToBeDestroyed)
releaseSwapChain();
break;
default:
break;
}
return QWindow::event(e);
}
void Window::init()
{
QRhi::Flags rhiFlags = QRhi::EnableDebugMarkers;
if (m_graphicsApi == QRhi::Null) {
QRhiNullInitParams params;
m_rhi.reset(QRhi::create(QRhi::Null, &params, rhiFlags));
}
#if QT_CONFIG(opengl)
if (m_graphicsApi == QRhi::OpenGLES2) {
m_fallbackSurface.reset(QRhiGles2InitParams::newFallbackSurface());
QRhiGles2InitParams params;
params.fallbackSurface = m_fallbackSurface.get();
params.window = this;
m_rhi.reset(QRhi::create(QRhi::OpenGLES2, &params, rhiFlags));
}
#endif
#if QT_CONFIG(vulkan)
if (m_graphicsApi == QRhi::Vulkan) {
QRhiVulkanInitParams params;
params.inst = vulkanInstance();
params.window = this;
m_rhi.reset(QRhi::create(QRhi::Vulkan, &params, rhiFlags));
}
#endif
#ifdef Q_OS_WIN
if (m_graphicsApi == QRhi::D3D11) {
QRhiD3D11InitParams params;
params.enableDebugLayer = true;
m_rhi.reset(QRhi::create(QRhi::D3D11, &params, rhiFlags));
rhi: Add D3D12 support - The optional nice-to-haves DebugMarkers, Timestamps, PipelineCache are not yet implemented (features reported as false, to be implemented later, although buffer/texture resource name setting already works as-is, regardless of DebugMarkers). - Mipmap generation for 3D textures is missing. Won't matter much given that 3D textures are not used in Qt for anything atm. For generating mipmaps for 2D (or 2D array) textures, the MiniEngine compute shader and approach is used. 3D support for the mipmap generator may be added later. 1D textures / arrays are supported except for mipmap generation, and so the OneDimensionalTextureMipmaps feature is reported as false. - Qt Quick and Qt Quick 3D are expected to be fully functional. (unforeseen issues are not impossible, of course) - Uses minimum feature level 11.0 when requesting the device. It is expected to be functional on resource binding tier 1 hardware even, although this has not been verified in practice. - 2 frames in flight with the usual resource buffering (QRhiBuffer::Dynamic is host visible (UPLOAD) and always mapped and slotted, other buffers and textures are device local (DEFAULT). Requests 3 swapchain buffers. Swapchains are mostly like with D3D11 (e.g. FLIP_DISCARD and SCALING_NONE). - The root signature generation is somewhat limited by the SPIR-V binding model and that we need to map every binding point using the nativeResourceBindingMap from the QShader. Thus the root signature is laid out so each stage has its own set of resources, with shader register clashes being prevented by setting the visibility to a given stage. Sampler handling is somewhat suboptimal but we are tied by the binding model and existing API design. It is in a fairly special situation due to the 2048 limit on a shader visible sampler heap, as opposed to 1000000 for SRVs and UAVS, so the approach we use for textures (just stage the CPU SRVs on the (per-frame slot) shader visible heap as they are encountered, effectively treating the heap as a ring buffer) would quickly lead to having to switch heaps many times with scenes with many draw calls and sampledTexture/sampler bindings in the srb. Whereas static samplers, which would be beautiful, are impossible to utilize safely since we do not have that concept (i.e. samplers specified upfront, tied to the graphics/compute pipeline) in the QRhi API, and an srb used at pipeline creation may change its associated resources, such as the QRhiSampler reference, by the time the shader resources are set for the draw call (or another, compatible srb may get used altogether), so specifying the samplers at root signature creation time is impossible. Rather, the current approach is to treat each sampler as a separate root parameter (per stage) having a descriptor table with a single entry. The shader visible sampler heap has exactly one instance of each unique sampler encountered during the lifetime of the QRhi. - Shader-wise no different from D3D11, works with HLSL/DXBC 5.0 (i.e. existing .qsb files with DXBC in them work as-is). But unlike D3D11, this one will try to pick 6.7, 6.6, ..., down to 5.0 from the QShader, in that order. - Uses D3D12MA for suballocating. As a result it can report vmem allocation statistics like the Vulkan backend, and it does more since the DXGI memory usage (incl. implicit resources) is also reported. This is optional technically, so we also have the option of going straight with the heavyweight CreateCommittedResource() instead. That is what we do if the adapter chosen reports it's software-based or when QT_D3D_NO_SUBALLOC=1 is set. - PreferSoftwareRenderer (picking the WARP device) and the env.var. QT_D3D_ADAPTER_INDEX work as with the D3D11 backend. - It is not unexpected that with large scenes that generate lots of draw calls with multiple textures/samplers per call the performance may be slightly below D3D11 (probably mostly due to descriptor management). Similarly, the reported memory usage will be higher, which is partly natural due to creating heaps, descriptor pools, staging areas, etc. upfront. Will need to be evaluated later how these can be tuned. Change-Id: I5a42580bb65f391ebceaf81adc6ae673cceacb74 Reviewed-by: Andy Nichols <andy.nichols@qt.io> Reviewed-by: Qt CI Bot <qt_ci_bot@qt-project.org>
2022-08-12 10:01:41 +00:00
} else if (m_graphicsApi == QRhi::D3D12) {
QRhiD3D12InitParams params;
params.enableDebugLayer = true;
m_rhi.reset(QRhi::create(QRhi::D3D12, &params, rhiFlags));
}
#endif
#if defined(Q_OS_MACOS) || defined(Q_OS_IOS)
if (m_graphicsApi == QRhi::Metal) {
QRhiMetalInitParams params;
m_rhi.reset(QRhi::create(QRhi::Metal, &params, rhiFlags));
}
#endif
if (!m_rhi)
qFatal("Failed to create RHI backend");
m_sc.reset(m_rhi->newSwapChain());
m_ds.reset(m_rhi->newRenderBuffer(QRhiRenderBuffer::DepthStencil,
QSize(), // no need to set the size here, due to UsedWithSwapChainOnly
1,
QRhiRenderBuffer::UsedWithSwapChainOnly));
m_sc->setWindow(this);
m_sc->setDepthStencil(m_ds.get());
m_rp.reset(m_sc->newCompatibleRenderPassDescriptor());
m_sc->setRenderPassDescriptor(m_rp.get());
customInit();
}
void Window::resizeSwapChain()
{
m_hasSwapChain = m_sc->createOrResize(); // also handles m_ds
const QSize outputSize = m_sc->currentPixelSize();
m_proj = m_rhi->clipSpaceCorrMatrix();
m_proj.perspective(45.0f, outputSize.width() / (float) outputSize.height(), 0.01f, 1000.0f);
m_proj.translate(0, 0, -4);
}
void Window::releaseSwapChain()
{
if (m_hasSwapChain) {
m_hasSwapChain = false;
m_sc->destroy();
}
}
void Window::render()
{
if (!m_hasSwapChain || m_notExposed)
return;
// If the window got resized or newly exposed, resize the swapchain. (the
// newly-exposed case is not actually required by some platforms, but
// f.ex. Vulkan on Windows seems to need it)
//
// This (exposeEvent + the logic here) is the only safe way to perform
// resize handling. Note the usage of the RHI's surfacePixelSize(), and
// never QWindow::size(). (the two may or may not be the same under the hood,
// depending on the backend and platform)
//
if (m_sc->currentPixelSize() != m_sc->surfacePixelSize() || m_newlyExposed) {
resizeSwapChain();
if (!m_hasSwapChain)
return;
m_newlyExposed = false;
}
QRhi::FrameOpResult r = m_rhi->beginFrame(m_sc.get());
if (r == QRhi::FrameOpSwapChainOutOfDate) {
resizeSwapChain();
if (!m_hasSwapChain)
return;
r = m_rhi->beginFrame(m_sc.get());
}
if (r != QRhi::FrameOpSuccess) {
qDebug("beginFrame failed with %d, retry", r);
requestUpdate();
return;
}
customRender();
m_rhi->endFrame(m_sc.get());
rhi: metal: Switch back to presentDrawable This convenience should be, according to the Apple docs, equivalent to calling present from a scheduled handler. (which on its own makes it unclear why we switched in the first place) In practice it seems the two approaches are not identical. It looks like that once a frame is submitted earlier than the next display link callback, the throttling behavior we implement in beginFrame() (waiting on the semaphore for the completion of the appropriate command list etc.) starts exhibiting unexpected behavior, not correctly throttling the thread to the refresh rate. Changing back to presentDrawable does not exhibit this at all. The suspicion is that presentDrawable is probably doing more than what the docs suggest, and so is not fully equivalent to calling present manually from a scheduled handler. Therefore, switch to presentDrawable now, which restores the expected cross-platform behavior, but make a note of the oddity, and also prepare the hellominimalcrossgfxtriangle manual test to provide an easy, self-contained application to allow experimenting in the future, if needed. This allows Qt Quick render thread animations to advance at the expected speed (because the render thread is correctly throttled to the refresh rate), even if the render thread decides to generate a new frame right away, without waiting for the next display link update. Without this patch, attempting to get updates not via requestUpdate(), but by other means (timer etc.) leads to incorrect throttling, and so the triangle in the test app is rotating faster than expected - but only with Metal. Running with OpenGL on macOS or with any API on any other platform the behavior will be correct. Even if scheduling updates without display link is not efficient, and should be discouraged, not doing so cannot break the core contract of vsync throttling, i.e. the thread cannot run faster just because it renders a frame not in response to an UpdateRequest. Amends 98b60450f7ce6b16464392747ab8721f30add15e (effectively reverts but keeps the code and the notes because we might want to clear this up some day) Pick-to: 6.4 6.3 6.2 Fixes: QTBUG-103415 Change-Id: Id3bd43e94785384142337564ce4b2644bf257100 Reviewed-by: Tor Arne Vestbø <tor.arne.vestbo@qt.io>
2022-06-21 08:51:04 +00:00
// Always request the next frame via requestUpdate(). On some platforms this is backed
// by a platform-specific solution, e.g. CVDisplayLink on macOS, which is potentially
// more efficient than a timer, queued metacalls, etc.
//
// However, the rendering behavior is identical no matter how the next round of
// rendering is triggered: the rendering thread is throttled to the presentation rate
// (either in beginFrame() or endFrame()) so the triangle should rotate at the exact
// same speed no matter which approach is taken here.
#if 1
requestUpdate();
rhi: metal: Switch back to presentDrawable This convenience should be, according to the Apple docs, equivalent to calling present from a scheduled handler. (which on its own makes it unclear why we switched in the first place) In practice it seems the two approaches are not identical. It looks like that once a frame is submitted earlier than the next display link callback, the throttling behavior we implement in beginFrame() (waiting on the semaphore for the completion of the appropriate command list etc.) starts exhibiting unexpected behavior, not correctly throttling the thread to the refresh rate. Changing back to presentDrawable does not exhibit this at all. The suspicion is that presentDrawable is probably doing more than what the docs suggest, and so is not fully equivalent to calling present manually from a scheduled handler. Therefore, switch to presentDrawable now, which restores the expected cross-platform behavior, but make a note of the oddity, and also prepare the hellominimalcrossgfxtriangle manual test to provide an easy, self-contained application to allow experimenting in the future, if needed. This allows Qt Quick render thread animations to advance at the expected speed (because the render thread is correctly throttled to the refresh rate), even if the render thread decides to generate a new frame right away, without waiting for the next display link update. Without this patch, attempting to get updates not via requestUpdate(), but by other means (timer etc.) leads to incorrect throttling, and so the triangle in the test app is rotating faster than expected - but only with Metal. Running with OpenGL on macOS or with any API on any other platform the behavior will be correct. Even if scheduling updates without display link is not efficient, and should be discouraged, not doing so cannot break the core contract of vsync throttling, i.e. the thread cannot run faster just because it renders a frame not in response to an UpdateRequest. Amends 98b60450f7ce6b16464392747ab8721f30add15e (effectively reverts but keeps the code and the notes because we might want to clear this up some day) Pick-to: 6.4 6.3 6.2 Fixes: QTBUG-103415 Change-Id: Id3bd43e94785384142337564ce4b2644bf257100 Reviewed-by: Tor Arne Vestbø <tor.arne.vestbo@qt.io>
2022-06-21 08:51:04 +00:00
#else
QTimer::singleShot(0, this, [this] { render(); });
#endif
}
void Window::customInit()
{
}
void Window::customRender()
{
}