-
-
Notifications
You must be signed in to change notification settings - Fork 21.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vulkan GPU/CPU synchronization is wrong #80550
Comments
We should lower the default |
I saw the other thread about latency and I forgot to comment on this. The current implementation is too wrong to be discussing it. Godot right now is doing double buffering (FRAME_LAG = 2). But because of the bug, it is actually somewhere between triple and double buffer (because we write to buffer[2] and right afterwards we wait on fence[0]). Double buffering is the lowest latency that can be reached with reasonable performance. Also frame_count doesn't matter much as long as frame_count > FRAME_LAG. In fact in my PR I will have rename frame_count because it's wrong and confusing. Perhaps after the problem is fixed the latency problems are gone. But if not, we will have the proper code to test other values (for instance, FRAME_LAG should not be hardcoded). Using the current nomenclature, we should always request |
The original code had two bugs: 1. It assumed Swapchain count can be used for proper CPU/GPU synchronization (it's not) 2. It used a double buffer scheme (FRAME_LAG = 2) but the vkWaitForFences/vkResetFences calls were in the wrong order and in the wrong place 3. The swapchain bug forced a quad-buffer scheme with roughly 1 or 2 frames of wait; which hid all the issues This commit gets rid of RenderingDeviceVulkan::frame_count & RenderingDeviceVulkan::frame which should've never existed and makes everything depend on VulkanContext::frame_index & VulkanContext::FRAME_LAG Add ability to tweak FRAME_LAG from within Project Settings Adds new setting display/window/vsync/buffer_count Adds new setting display/window/vsync/swapchain_count See godotengine#80550 for details ------------------------------------------------------------------ Fix comment grammar Use Godot's functions instead of std ------------------------------------------------------------------ Additionally handle inconsistent state when destroying windows: window.image_acquired_semaphores[i] contains uninitialized values. This means that if VulkanContext::_update_swap_chain had failed early (this can happen for valid reasons and due to unexpected failures) then _clean_up_swap_chain will try to destroy an invalid handle. This fix always sets the handle to VK_NULL_HANDLE. Likewise perform the checks in various other resources in _clean_up_swap_chain in case the resources were partially initialized and are now in an inconsistent state. The API should guarantee the call to be valid if the handle was VK_NULL_HANDLE, but I don't trust driver implementations to handle that case. Affects godotengine#81670 Co-authored-by: Clay John <[email protected]>
The original code had two bugs: 1. It assumed Swapchain count can be used for proper CPU/GPU synchronization (it's not) 2. It used a double buffer scheme (FRAME_LAG = 2) but the vkWaitForFences/vkResetFences calls were in the wrong order and in the wrong place 3. The swapchain bug forced a quad-buffer scheme with roughly 1 or 2 frames of wait; which hid all the issues This commit gets rid of RenderingDeviceVulkan::frame_count & RenderingDeviceVulkan::frame which should've never existed and makes everything depend on VulkanContext::frame_index & VulkanContext::FRAME_LAG Add ability to tweak FRAME_LAG from within Project Settings Adds new setting display/window/vsync/buffer_count Adds new setting display/window/vsync/swapchain_count See godotengine#80550 for details ------------------------------------------------------------------ Fix comment grammar Use Godot's functions instead of std ------------------------------------------------------------------ Additionally handle inconsistent state when destroying windows: window.image_acquired_semaphores[i] contains uninitialized values. This means that if VulkanContext::_update_swap_chain had failed early (this can happen for valid reasons and due to unexpected failures) then _clean_up_swap_chain will try to destroy an invalid handle. This fix always sets the handle to VK_NULL_HANDLE. Likewise perform the checks in various other resources in _clean_up_swap_chain in case the resources were partially initialized and are now in an inconsistent state. The API should guarantee the call to be valid if the handle was VK_NULL_HANDLE, but I don't trust driver implementations to handle that case. Affects godotengine#81670 Co-authored-by: Clay John <[email protected]>
Is this issue still on the blocks for 4.3, or is this now moved to 4.x? This seems deeply related to the AMD issues we're seeing with frame shuffling when vsync is enabled, especially on 5700 XTs. |
I tested darksylinc's build of 4.2 tagged 'binary for testers' and still saw the stutter and shuffling issues on my Radeon 5700 XT. The Vulkan renderer also frequently crashes and corrupts the entire display leading to an irrecoverable driver crash that requires a complete system shutdown. Unsure if it's related. |
The original code had two bugs: 1. It assumed Swapchain count can be used for proper CPU/GPU synchronization (it's not) 2. It used a double buffer scheme (FRAME_LAG = 2) but the vkWaitForFences/vkResetFences calls were in the wrong order and in the wrong place 3. The swapchain bug forced a quad-buffer scheme with roughly 1 or 2 frames of wait; which hid all the issues This commit gets rid of RenderingDeviceVulkan::frame_count & RenderingDeviceVulkan::frame which should've never existed and makes everything depend on VulkanContext::frame_index & VulkanContext::FRAME_LAG Add ability to tweak FRAME_LAG from within Project Settings Adds new setting display/window/vsync/buffer_count Adds new setting display/window/vsync/swapchain_count See godotengine#80550 for details ------------------------------------------------------------------ Fix comment grammar Use Godot's functions instead of std ------------------------------------------------------------------ Additionally handle inconsistent state when destroying windows: window.image_acquired_semaphores[i] contains uninitialized values. This means that if VulkanContext::_update_swap_chain had failed early (this can happen for valid reasons and due to unexpected failures) then _clean_up_swap_chain will try to destroy an invalid handle. This fix always sets the handle to VK_NULL_HANDLE. Likewise perform the checks in various other resources in _clean_up_swap_chain in case the resources were partially initialized and are now in an inconsistent state. The API should guarantee the call to be valid if the handle was VK_NULL_HANDLE, but I don't trust driver implementations to handle that case. Affects godotengine#81670 Co-authored-by: Clay John <[email protected]>
@DarioSamo @darksylinc Can this be considered fixed by #87340, or only partially? If so, what's the current status? |
The issue described here is solved, as the new synchronization order in the context rewrite is what Matias described as the correct solution. |
Godot version
4.2.x master [16a9356]
System information
Godot v4.2.dev (262d1ea) - Ubuntu 20.04.6 LTS (Focal Fossa) - X11 - Vulkan (Forward+) - dedicated AMD Radeon RX 6800 XT - AMD Ryzen 9 5900X 12-Core Processor (24 Threads)
Issue description
"OOOFFF"
That's the only way I can describe this ticket.
It starts as a simple CPU to GPU synchronozation bug but there is also another set of wrong assumptions and bugs which, when combined, "more or less" synchronization ends up being correct.
There are so many things going on I don't know where to start, so I'm going to write this in the order in which I discovered these problems.
At worst this is just a waste a VRAM, at best suddenly many of those DEVICE_LOST start making sense.
A lone validation error that shouldn't have happened
It all started with a random validation error saying a
vkDestroyRenderPass
got called while it was still in use while trying to repro #80286 on AMD, an error that I could not reproduce ever again.So I went looking for vkDestroyRenderPass calls in the codebase and luckily they weren't many.
I'm still evaluating everything to understand what's going on. But it is clear it's wrong (there's no way to sugar coat it).
That's when I noticed the function
RenderingDeviceVulkan::swap_buffers
does the following:Where's the wait?.
That's wrong. The code should've been something along:
I was about to shout 'Eureka!', but went into swap_buffers() just in case there was already a wait there (although it would be some weird math because it would have to be
vkWaitForFences(device, 1, &fences[frame + 1])
).That's when it got worse.
An inverted wait
I couldn't find anything in
swap_buffers
, but I was wondering where is thevkWaitForFences
call???I found it in VulkanContext::prepare_buffers. But sadly it performs the following:
This is wrong.
In case someone wonders, for data that is used to communicate CPU from/to GPU we are supposed to have at least 2 pools of memory:
CPU writes to pool 0 while GPU is reading from pool 1, and CPU writes to pool 1 while GPU is reading from pool 0. If the GPU is still reading from pool 0 and CPU wants now to write to pool 0 (it's done writing to 1), it needs to stall and wait.
This is a standard double buffer scheme. A triple buffer scheme is the same, but there is an extra pool to minimize potential stalls.
Unfortunately with Godot's current code the following sequence is what will happen after running for several frames.
And thus we're not actually synchronizing anything! The CPU keeps writing to data that may be in use by the GPU.
A simple possible solution is the following:
For best performance it's often suggested to delay the vkWaitForFences() as much as possible (i.e. until it's actually needed for us to write to a buffer from CPU, or destroy a resource, etc). But it would seem given Godot's code chaotic complexity this is the only safe solution that won't result in more hazards.
Sadly, the report doesn't end here. There is more. Godot doesn't have 2 pools, it's got 4.
Wrong assumptions on Swapchains
RenderingDeviceVulkan::frame_count
is set to the following:That comment is HUGE. This is a false assumption.
Neither
vkAcquireNextImageKHR
norvkQueuePresentKHR
have an obligation to synchronize the CPU.They use VkSemaphores, which synchronize GPU work with other GPU work (e.g. so that a window isn't presented to the screen while the GPU is still working on it, producing a similar effect to when VSync is off).
However in practice most (all???) drivers do stall the CPU due to limitations in SW or HW architectures. Linux+X11 is notorious for stalling inside
vkAcquireNextImageKHR
, but other drivers prefer to stall insidevkQueuePresentKHR
instead.Since we are debunking false assumptions, we also don't have the guarantee that
vkAcquireNextImageKHR
will return swapchain indices in order (e.g. 0, 1, 2, 3, 0, 1, 2, 3). It could return them out of order or even return the same set multiple times (e.g. 2, 0, 2, 0, 1).We could set the value frame_count to as high as 30, and we still won't have the guarantee that CPU and GPU work will be synchronized. We need a fence.
Swapchains should not dictate the number of pools
As I said earlier, on a double buffer scheme, we should have 2 pools of memories (
FRAME_LAG = 2
).But because of the wrong assumption on swapchains, Godot keeps 4 pools of memories (because
frame_count = 4
on most machines).Swapchain count should not be used for anything else other than tracking swapchain related data like VkRenderPass.
Combine these 2 bugs and it works
So we have 2 competing implementations and 2 bugs:
VulkanContext::prepare_buffers
usesvkWaitForFences
but the calls are out of orderVulkanContext
uses swapchain count to use 4 pools instead of 2.I've ran simulations in Spreadsheets of what happens when we run our current
vkWaitForFences
code with just 2 values and aframe_count = 4
and it would seem that this just happens to synchronize correctly, a happy accident.Because Godot will stall when starting the 3rd frame, and it keeps 4 pools of data, the data should be safe for access on the 3rd frame.
However the complexity of all this is too large, so it's possible that I missed something and we may still get a race condition (that lone
vkDestroyRenderPass
error that sparked this invegistation must've come from somewhere!), just a rare one.I also noticed that if, for some God-forsaken reason,
frame_count = 3
or lower, then we will have race conditions.I analyzed the surrounding code behind
frame_count
setup, and this is possible but it is improbable.We set
desiredNumOfSwapchainImages = 3
and I don't think there are implementations that havesurfCapabilities.maxImageCount = 2
however I could be wrong. Also, if the driver returns 2 invkGetSwapchainImagesKHR
despite us asking for at least 3 (which AFAIK would be a driver bug), we're still in trouble.However it would make sense that those users who can repro
DEVICE_LOST
with high frequency happen to haveframe_count <= 3
.I didn't analyze what happens when
separate_present_queue == true
. In fact, this is so rare that I don't think this code has been tested at all, and we have no way to verify yet if the bug reports have been using this.Steps to reproduce
This bug is rare to repro, but it should be much easier to repro if we force frame_count to a value lower than 4.
Minimal reproduction project
None.
The text was updated successfully, but these errors were encountered: