RenderingDevice: Extraneous wait on swapchain adds 1-2 frames of display latency #100025

KeyboardDanni · 2024-12-04T21:48:33Z

Production edit: Related to Godot 4 editor UI (menus) significantly slower than 3.x #71795.

Tested versions

Reproducible in latest git

System information

Godot v4.4.dev (9e60984) - Windows 10.0.22631 - Multi-window, 2 monitors - Direct3D 12 (Forward+) - dedicated NVIDIA GeForce RTX 3070 (NVIDIA; 32.0.15.6614) - AMD Ryzen 7 5800X3D 8-Core Processor (16 threads)

Issue description

This bug was found independently by at least two other people, but up until now hasn't been documented in the issue tracker.

While testing display latency using the latency tester, I noticed that both Vulkan and D3D12 have one additional frame of latency versus the Compatibility (OpenGL) renderer, even when queueing up the same amount of frames. I decided to run the tester through the PIX graphics debugger on D3D12 to see what I could find.

With a double-buffered frame queue:

Forcing a single-buffered frame queue:

Specifically, notice the blue markers on the Monitor row. Even when we don't buffer any additional frames, we end up waiting until the next V-Sync before we execute the command list. And once those commands do execute, we still have to wait for another V-Sync before we see the results. So we end up waiting two V-Syncs even though we should only wait for one. If we can eliminate the unnecessary V-Sync wait, we should be able to save a frame of latency without impacting parallelism (in fact, performance may actually improve).

Oddly enough, when we have a separate present queue via #define FORCE_SEPARATE_PRESENT_QUEUE 1, the latency is worse, at 2 added frames.

#99257 has a proposed fix by splitting the rendering work and presentation into separate command lists, but it does not seem to solve the issue on my system (Windows 11 desktop, D3D12, nVidia). It does improve the latency when forcing a separate present queue, but it does not completely eliminate the added latency, as the command list is still waiting on the swapchain.

One possible cause of this issue, even with the above PR, might be due to only using one framebuffer total instead of one framebuffer per swapchain image. If the framebuffer is busy waiting on the next available swapchain image, that would explain why none of the command lists can start until the next V-Sync.

Note to bugsquad: While #75830 also deals with latency in RenderingDevice, this issue is meant to track and find a solution for one specific cause. I have identified at least three or four specific areas for improvement in RenderingDevice latency.

Steps to reproduce

Grab https://github.com/KeyboardDanni/godot-latency-tester and add rendering_device/driver.windows="d3d12" to project.godot.
Run the project through the PIX graphics debugger (make sure "Launch For GPU Capture" is unchecked).
Make sure the Timing Capture options include all the GPU checkboxes, and click Start Timing Capture.
Stop the capture after some amount of time.
Zoom in on the timeline and click on one of the Command List blocks in the API Queue row, then follow the arrows. If you don't see the arrows, enable the Thread rows one by one in the Lane Selector on the right until they appear.

Minimal reproduction project (MRP)

https://github.com/KeyboardDanni/godot-latency-tester

The text was updated successfully, but these errors were encountered:

adamscott added bug topic:rendering topic:2d topic:3d labels Dec 5, 2024

adamscott added this to the 4.x milestone Dec 5, 2024

Calinou added the performance label Dec 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RenderingDevice: Extraneous wait on swapchain adds 1-2 frames of display latency #100025

RenderingDevice: Extraneous wait on swapchain adds 1-2 frames of display latency #100025

KeyboardDanni commented Dec 4, 2024 •

edited by Calinou

Loading

RenderingDevice: Extraneous wait on swapchain adds 1-2 frames of display latency #100025

RenderingDevice: Extraneous wait on swapchain adds 1-2 frames of display latency #100025

Comments

KeyboardDanni commented Dec 4, 2024 • edited by Calinou Loading

Tested versions

System information

Issue description

Steps to reproduce

Minimal reproduction project (MRP)

KeyboardDanni commented Dec 4, 2024 •

edited by Calinou

Loading