-
-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expose double-buffered V-Sync as an option #4065
Comments
Does Vulkan have a way to force the use of double-buffered V-Sync (over triple-buffered)? I remember Vulkan not offering a lot of control in this aspect. See discussion in the pull request where V-Sync options were reimplemented for Vulkan: godotengine/godot#48622 |
I could not find a mention of double buffering in that discussion, but unlike OpenGL, Vulkan gives the application control over the buffering. See Here is also a demo that switches the buffering dynamically: |
Feel free to open a pull request to implement this feature 🙂 |
I had implemented the naive solution, but it turns out that Mesa sets |
Unfortunately this does not appear to be the case currently. KhronosGroup/Vulkan-Docs#370 already points to this issue. A solution has been in the works for over five years now. 😕 Hopefully KhronosGroup/Vulkan-Docs#1364 will get there soon. If done right, it should allow the V-Sync and especially the Adaptive V-Sync options to be implemented such that they offer competitive, often superior, latency to Mailbox and V-Sync Off, while exhibiting less stutter and consuming fewer resources. As I was reading the Vulkan spec and KhronosGroup/Vulkan-Docs#1137 I got the impression that Godot should not request at least 3 images in the swap chain, but Unless someone else – with better knowledge of these APIs perhaps – has an idea on how to implement this proposal with what is currently available, I am afraid that it will have to be postponed. 😞 Edit: The https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12086 I would not be able to properly test a PR that I author. |
Looks like it's implemented in Mesa under X11, and is coming soon to Mesa under Wayland: https://www.phoronix.com/news/Mesa-KHR_present_wait-Wayland Note that in situations where you can use VRR, I generally recommend using a FPS cap just below the refresh rate instead. This allows for the lowest possible latency while still not having any tearing, and handling framerate variations better than any other method. Enforcing double-buffered V-Sync will still be useful in situations where you can't use VRR (for instance, when you want black frame insertion from your display or don't want to see any VRR flicker). |
Hi! Since my PR godotengine/godot#80566 will be addressing this for Vulkan (and in theory it should apply to Metal & D3D12 backends if they make use of the parameter properly) I'll be addressing a few questions here to avoid derailing my PR's discussion (originally my PR fixes a synchronization bug): There are two things that are linked together very tightly but are not exactly the same:
Backbuffers or SwapchainsBackbuffer count is explained since the 90's: Consoles like the original NES had a single front buffer; which is always the one being presented to screen and the CPU had to iterate through every pixel faster than they're being sent to CRT scan; otherwise visible tearing would appear. Then double buffering appeared. The CPU/GPU has all the time in the world to draw to the back buffer. And once it's ready, we must wait for the VBLANK interval; swap the front & back buffer; and what was once the front buffer is now the back buffer; and is now available for rendering the next frame. Triple buffer uses 1 front buffer and 2 backbuffers. Which means the GPU doesn't have to wait for VBLANK interval, it can start writing into the 2nd backbuffer. Number of frames the CPU is allowed to go ahead the GPUThe thing about rendering more than one frame is that it means we need double (or triple) of a lot of other things! It's not just the swapchain. If we send a world matrix for a draw call; we need to store it somewhere in GPU memory so that our vertex shaders can use it. That means for frame 0 we do For frame 3 we must use Swapchain count vs Buffer countA high swapchain count allows the GPU to continue while the front buffer is blocked while being presented. If the GPU is too slow, the buffer count is going to matter a lot to unblock the CPU. If the GPU is very fast, swapchain will dominate latency values. I wrote a VERY long text. Then realized I was wrong. Then rewrote it again, realize I was wrong again. I guess true Nirvana is reached when I realize I know nothing. In fact based on this, I will have to change the PR to expose both settings separately (right now it is set so that kNumSwapchains = kNumBuffers + 1). The truth is, this has very complex interactions so I decided to write a simulator instead for FIFO presentation. For example the following parameters: static const size_t kNumBuffers = 2u;
static const size_t kNumSwapchains = 3u;
static const size_t kVBlank = 16u;
static const size_t kCpuTime = 7u;
static const size_t kGpuTime = 17u;
static const size_t kCpuFrameVariance = 2u;
static const size_t kGpuFrameVariance = 2u; Can be interpreted as the following:
Results:
If we change kNumSwapchains to 2 (double buffer), we get:
And if we use kNumBuffers = 3 & kNumSwapchains = 4
Avg FPS improved slightly, but avg lag got worse compared to kNumBuffers = 2 & kNumSwapchains = 3 The GPU is struggling to maintain 60 FPS, and triple buffer improved framerate AND lag. However if we repeat the test with kGpuTime = 12 (that is, between 10 & 14ms):
Triple buffer improved framerate but made lag much worse. You can download the snippet and compile it locally and play with the results. |
OK One thing I left out from the things I removed: The main problem requested of the proposal is fighting lag. Triple/Double buffer is a way to forcing certain behavior that has a tendency to reduce lag as a side effect. However if we want really low lag, that can be achieved by measuring frametimes, estimating how long, and sleeping. I talked about this in-depth in Stack Overflow The TL;DR is that IF (big if) we can correctly estimate how long rendering will take on the next frame, let's say it will take 10ms, then we have to sleep for another 6ms so that we start preparing command as late as possible. This allows us to see keystrokes / mouse clicks etc that happened during those 6ms we slept; that would've otherwise be delayed for the next time the CPU is free. There is a lot of devil in the details though. I saw that fighting games like Guilty Gear Xrd took a very silly but good approach: They have a calibration section in the Options; and ask the user to press the button until it hits the rhythm. Assuming the system is fast enough to almost always hit VSync, this is a lazy (yet possibly effective) way of calculating how long to sleep. The plumbing behind that boils down to storing a number and then calling Sleep(saved_number) at the beginning of the frame. |
Thanks for the great writeup and simulator 🙂
I wonder how this relates to frame delta smoothing. Can a similar estimation logic be used? |
"Yesn't". One would have to see if the logic is useful/reusable, but really the hard part is that we need to measure:
That's why Guilty Gear Xrd solution is so stupidly simple: Since fighting games have very stable framerate (they display the same two characters throughout the entire session, with the same background; in a controlled scenario) they can just ask the user what feels right until the user manually finds the right amount of time to sleep per frame. |
The VSync simulator is now interactive and online. |
godotengine/godot#87340 has added the option |
Describe the project you are working on
Nothing so far, just having a look at Godot. 😃
Describe the problem or limitation you are having in your project
In some scenarios none of the currently exposed presentation strategies is both jitter-free and low-latency.
Describe the feature / enhancement and how it helps to overcome the problem or limitation
If the preparation of a frame consistently takes less time than the refresh interval of the display, then double buffered vsync saves one refresh interval of latency over the currently offered triple buffered vsync. Unlike immediate and mailbox presentation modes, vsync has consistent timing and therefore less jitter. With frame scheduling it can sometimes have better latency than those as well, depending on the variance of the frame time. Compared to other non-vsync modes it reduces power consumption and component wear.
Describe how your proposal will work, with code, pseudo-code, mock-ups, and/or diagrams
For example a VSYNC_DOUBLE_BUFFERED enumerator for DisplayServer::VSyncMode
If this enhancement will not be used often, can it be worked around with a few lines of script?
No
Is there a reason why this should be core and not an add-on in the asset library?
Configuration of the swap chain is handled by core, and cannot be done elsewhere.
The text was updated successfully, but these errors were encountered: