Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RenderingDevice: Extraneous wait on swapchain adds 1-2 frames of display latency #100025

Open
KeyboardDanni opened this issue Dec 4, 2024 · 0 comments

Comments

@KeyboardDanni
Copy link
Contributor

KeyboardDanni commented Dec 4, 2024

Tested versions

Reproducible in latest git

System information

Godot v4.4.dev (9e60984) - Windows 10.0.22631 - Multi-window, 2 monitors - Direct3D 12 (Forward+) - dedicated NVIDIA GeForce RTX 3070 (NVIDIA; 32.0.15.6614) - AMD Ryzen 7 5800X3D 8-Core Processor (16 threads)

Issue description

This bug was found independently by at least two other people, but up until now hasn't been documented in the issue tracker.

While testing display latency using the latency tester, I noticed that both Vulkan and D3D12 have one additional frame of latency versus the Compatibility (OpenGL) renderer, even when queueing up the same amount of frames. I decided to run the tester through the PIX graphics debugger on D3D12 to see what I could find.

With a double-buffered frame queue:
Image

Forcing a single-buffered frame queue:
Image

Specifically, notice the blue markers on the Monitor row. Even when we don't buffer any additional frames, we end up waiting until the next V-Sync before we execute the command list. And once those commands do execute, we still have to wait for another V-Sync before we see the results. So we end up waiting two V-Syncs even though we should only wait for one. If we can eliminate the unnecessary V-Sync wait, we should be able to save a frame of latency without impacting parallelism (in fact, performance may actually improve).

Oddly enough, when we have a separate present queue via #define FORCE_SEPARATE_PRESENT_QUEUE 1, the latency is worse, at 2 added frames.

#99257 has a proposed fix by splitting the rendering work and presentation into separate command lists, but it does not seem to solve the issue on my system (Windows 11 desktop, D3D12, nVidia). It does improve the latency when forcing a separate present queue, but it does not completely eliminate the added latency, as the command list is still waiting on the swapchain.

One possible cause of this issue, even with the above PR, might be due to only using one framebuffer total instead of one framebuffer per swapchain image. If the framebuffer is busy waiting on the next available swapchain image, that would explain why none of the command lists can start until the next V-Sync.

Note to bugsquad: While #75830 also deals with latency in RenderingDevice, this issue is meant to track and find a solution for one specific cause. I have identified at least three or four specific areas for improvement in RenderingDevice latency.

Steps to reproduce

  • Grab https://github.com/KeyboardDanni/godot-latency-tester and add rendering_device/driver.windows="d3d12" to project.godot.
  • Run the project through the PIX graphics debugger (make sure "Launch For GPU Capture" is unchecked).
  • Make sure the Timing Capture options include all the GPU checkboxes, and click Start Timing Capture.
  • Stop the capture after some amount of time.
  • Zoom in on the timeline and click on one of the Command List blocks in the API Queue row, then follow the arrows. If you don't see the arrows, enable the Thread rows one by one in the Lane Selector on the right until they appear.

Minimal reproduction project (MRP)

https://github.com/KeyboardDanni/godot-latency-tester

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants