You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This bug was found independently by at least two other people, but up until now hasn't been documented in the issue tracker.
While testing display latency using the latency tester, I noticed that both Vulkan and D3D12 have one additional frame of latency versus the Compatibility (OpenGL) renderer, even when queueing up the same amount of frames. I decided to run the tester through the PIX graphics debugger on D3D12 to see what I could find.
With a double-buffered frame queue:
Forcing a single-buffered frame queue:
Specifically, notice the blue markers on the Monitor row. Even when we don't buffer any additional frames, we end up waiting until the next V-Sync before we execute the command list. And once those commands do execute, we still have to wait for another V-Sync before we see the results. So we end up waiting two V-Syncs even though we should only wait for one. If we can eliminate the unnecessary V-Sync wait, we should be able to save a frame of latency without impacting parallelism (in fact, performance may actually improve).
Oddly enough, when we have a separate present queue via #define FORCE_SEPARATE_PRESENT_QUEUE 1, the latency is worse, at 2 added frames.
#99257 has a proposed fix by splitting the rendering work and presentation into separate command lists, but it does not seem to solve the issue on my system (Windows 11 desktop, D3D12, nVidia). It does improve the latency when forcing a separate present queue, but it does not completely eliminate the added latency, as the command list is still waiting on the swapchain.
One possible cause of this issue, even with the above PR, might be due to only using one framebuffer total instead of one framebuffer per swapchain image. If the framebuffer is busy waiting on the next available swapchain image, that would explain why none of the command lists can start until the next V-Sync.
Note to bugsquad: While #75830 also deals with latency in RenderingDevice, this issue is meant to track and find a solution for one specific cause. I have identified at least three or four specific areas for improvement in RenderingDevice latency.
Run the project through the PIX graphics debugger (make sure "Launch For GPU Capture" is unchecked).
Make sure the Timing Capture options include all the GPU checkboxes, and click Start Timing Capture.
Stop the capture after some amount of time.
Zoom in on the timeline and click on one of the Command List blocks in the API Queue row, then follow the arrows. If you don't see the arrows, enable the Thread rows one by one in the Lane Selector on the right until they appear.
Tested versions
Reproducible in latest git
System information
Godot v4.4.dev (9e60984) - Windows 10.0.22631 - Multi-window, 2 monitors - Direct3D 12 (Forward+) - dedicated NVIDIA GeForce RTX 3070 (NVIDIA; 32.0.15.6614) - AMD Ryzen 7 5800X3D 8-Core Processor (16 threads)
Issue description
This bug was found independently by at least two other people, but up until now hasn't been documented in the issue tracker.
While testing display latency using the latency tester, I noticed that both Vulkan and D3D12 have one additional frame of latency versus the Compatibility (OpenGL) renderer, even when queueing up the same amount of frames. I decided to run the tester through the PIX graphics debugger on D3D12 to see what I could find.
With a double-buffered frame queue:
Forcing a single-buffered frame queue:
Specifically, notice the blue markers on the Monitor row. Even when we don't buffer any additional frames, we end up waiting until the next V-Sync before we execute the command list. And once those commands do execute, we still have to wait for another V-Sync before we see the results. So we end up waiting two V-Syncs even though we should only wait for one. If we can eliminate the unnecessary V-Sync wait, we should be able to save a frame of latency without impacting parallelism (in fact, performance may actually improve).
Oddly enough, when we have a separate present queue via
#define FORCE_SEPARATE_PRESENT_QUEUE 1
, the latency is worse, at 2 added frames.#99257 has a proposed fix by splitting the rendering work and presentation into separate command lists, but it does not seem to solve the issue on my system (Windows 11 desktop, D3D12, nVidia). It does improve the latency when forcing a separate present queue, but it does not completely eliminate the added latency, as the command list is still waiting on the swapchain.
One possible cause of this issue, even with the above PR, might be due to only using one framebuffer total instead of one framebuffer per swapchain image. If the framebuffer is busy waiting on the next available swapchain image, that would explain why none of the command lists can start until the next V-Sync.
Note to bugsquad: While #75830 also deals with latency in RenderingDevice, this issue is meant to track and find a solution for one specific cause. I have identified at least three or four specific areas for improvement in RenderingDevice latency.
Steps to reproduce
rendering_device/driver.windows="d3d12"
toproject.godot
.Minimal reproduction project (MRP)
https://github.com/KeyboardDanni/godot-latency-tester
The text was updated successfully, but these errors were encountered: