-
Notifications
You must be signed in to change notification settings - Fork 422
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Queue Submission tracking no longer retiring correctly when Timeline Semaphores are reused across queues #3590
Comments
I was able to reproduce it with different queues of the same family, it was just a tricky timing to hit with my app. In the end, I think it can happen with any pair of queues, regardless of their families. |
That's good! It would be really strange if it required separate queue families. I'll try to come up with a test case that reproduces the problem based on your description. |
Yep, I expected that if my hypothesis is true, this should be reproducible within the same queue family, but it was harder to do so in my setup. By the way, I think the Imagine a timeline semaphore that starts at 1 and is incremented by So the entire timeline would look like this:
From the app's perspective, the value being The call to |
@farnoy, I'm finally getting around to looking at this in detail. I'm not sure if the scenario you're describing in the original report is valid usage. (I'm also not sure if validation is handling it correctly). Let me restate it and try to explain the problem:
The problem is that nothing in the above ensures that the commands in queue A have actually executed. If A is extremely busy, and B is not, the work on A that will signal value =1 could still have not executed yet. But the vkWaitSemaphores() call would return because B set value = 2. Even worse, the implementation would have to have some way to ensure that the timeline value didn't go backwards when A finally executed and tried to set value = 1. This condition is
The only way this could work reliably is if B used the semaphore as a wait semaphore with value = 1 BEFORE using it as a signal semaphore with value = 2. Does that make sense to you? Like I said, I think validation isn't handling this situation correctly, but if it was you should be getting 03242. I'll set up a test case to trigger it, as well as the correct sequence with the wait semaphore within the 2 signals. |
That's my bad, I missed this detail when I wrote down the steps, but yes, I was implicitly assuming there was a dependency between #1 and #2, otherwise it's not a valid usage as you've found. I'm confident my app was also inserting this dependency which is why it validated on the previous SDK & there's no corruption. After adding the dependency between steps 1 and 2, were you able to reproduce the issue? I think you need a separate gadget, like a |
Yes I'm able to reproduce this now, with 2 queues and 2 command buffers that do vkCmdCopyCmdBuffer(). |
Make all non-const member data private and restructure semaphore tracking to allow it to eventually be made thread safe. The biggest change is that each semaphore's state now keeps a queue of pending operations, which removes the need to frequently walk all pending submissions in every queue. Fixes KhronosGroup#3590, which was introduced when one the 'for every queue' loops was removed in a previous commit. This change also adds checking for: VUID-VkAcquireNextImageInfoKHR-semaphore-01781 VUID-vkAcquireNextImageKHR-semaphore-01779
This test is a reproduction case for Issue KhronosGroup#3590
Make all non-const member data private and restructure semaphore tracking to allow it to eventually be made thread safe. The biggest change is that each semaphore's state now keeps a queue of pending operations, which removes the need to frequently walk all pending submissions in every queue. Fixes KhronosGroup#3590, which was introduced when one the 'for every queue' loops was removed in a previous commit. This change also adds checking for: VUID-VkAcquireNextImageInfoKHR-semaphore-01781 VUID-vkAcquireNextImageKHR-semaphore-01779
This test is a reproduction case for Issue KhronosGroup#3590
Make all non-const member data private and restructure semaphore tracking to allow it to eventually be made thread safe. The biggest change is that each semaphore's state now keeps a queue of pending operations, which removes the need to frequently walk all pending submissions in every queue. Fixes KhronosGroup#3590, which was introduced when one the 'for every queue' loops was removed in a previous commit. This change also adds checking for: VUID-VkAcquireNextImageInfoKHR-semaphore-01781 VUID-vkAcquireNextImageKHR-semaphore-01779
This test is a reproduction case for Issue KhronosGroup#3590
Make all non-const member data private and restructure semaphore tracking to allow it to eventually be made thread safe. The biggest change is that each semaphore's state now keeps a queue of pending operations, which removes the need to frequently walk all pending submissions in every queue. Fixes KhronosGroup#3590, which was introduced when one the 'for every queue' loops was removed in a previous commit. This change also adds checking for: VUID-VkAcquireNextImageInfoKHR-semaphore-01781 VUID-vkAcquireNextImageKHR-semaphore-01779
This test is a reproduction case for Issue KhronosGroup#3590
Make all non-const member data private and restructure semaphore tracking to allow it to eventually be made thread safe. The biggest change is that each semaphore's state now keeps a queue of pending operations, which removes the need to frequently walk all pending submissions in every queue. Fixes KhronosGroup#3590, which was introduced when one the 'for every queue' loops was removed in a previous commit. This change also adds checking for: VUID-VkAcquireNextImageInfoKHR-semaphore-01781 VUID-vkAcquireNextImageKHR-semaphore-01779
This test is a reproduction case for Issue KhronosGroup#3590
Make all non-const member data private and restructure semaphore tracking to allow it to eventually be made thread safe. The biggest change is that each semaphore's state now keeps a queue of pending operations, which removes the need to frequently walk all pending submissions in every queue. Fixes KhronosGroup#3590, which was introduced when one the 'for every queue' loops was removed in a previous commit. This change also adds checking for: VUID-VkAcquireNextImageInfoKHR-semaphore-01781 VUID-vkAcquireNextImageKHR-semaphore-01779
This test is a reproduction case for Issue KhronosGroup#3590
Make all non-const member data private and restructure semaphore tracking to allow it to eventually be made thread safe. The biggest change is that each semaphore's state now keeps a queue of pending operations, which removes the need to frequently walk all pending submissions in every queue. Fixes KhronosGroup#3590, which was introduced when one the 'for every queue' loops was removed in a previous commit. This change also adds checking for: VUID-VkAcquireNextImageInfoKHR-semaphore-01781 VUID-vkAcquireNextImageKHR-semaphore-01779
This test is a reproduction case for Issue KhronosGroup#3590
Make all non-const member data private and restructure semaphore tracking to allow it to eventually be made thread safe. The biggest change is that each semaphore's state now keeps a queue of pending operations, which removes the need to frequently walk all pending submissions in every queue. Fixes KhronosGroup#3590, which was introduced when one the 'for every queue' loops was removed in a previous commit. This change also adds checking for: VUID-VkAcquireNextImageInfoKHR-semaphore-01781 VUID-vkAcquireNextImageKHR-semaphore-01779
This test is a reproduction case for Issue KhronosGroup#3590
Make all non-const member data private and restructure semaphore tracking to allow it to eventually be made thread safe. The biggest change is that each semaphore's state now keeps a queue of pending operations, which removes the need to frequently walk all pending submissions in every queue. Fixes KhronosGroup#3590, which was introduced when one the 'for every queue' loops was removed in a previous commit. This change also adds checking for: VUID-VkAcquireNextImageInfoKHR-semaphore-01781 VUID-vkAcquireNextImageKHR-semaphore-01779
This test is a reproduction case for Issue KhronosGroup#3590
Make all non-const member data private and restructure semaphore tracking to allow it to eventually be made thread safe. The biggest change is that each semaphore's state now keeps a queue of pending operations, which removes the need to frequently walk all pending submissions in every queue. Fixes KhronosGroup#3590, which was introduced when one the 'for every queue' loops was removed in a previous commit. This change also adds checking for: VUID-VkAcquireNextImageInfoKHR-semaphore-01781 VUID-vkAcquireNextImageKHR-semaphore-01779
This test is a reproduction case for Issue KhronosGroup#3590
Make all non-const member data private and restructure semaphore tracking to allow it to eventually be made thread safe. The biggest change is that each semaphore's state now keeps a queue of pending operations, which removes the need to frequently walk all pending submissions in every queue. Fixes KhronosGroup#3590, which was introduced when one the 'for every queue' loops was removed in a previous commit. This change also adds checking for: VUID-VkAcquireNextImageInfoKHR-semaphore-01781 VUID-vkAcquireNextImageKHR-semaphore-01779
This test is a reproduction case for Issue KhronosGroup#3590
Make all non-const member data private and restructure semaphore tracking to allow it to eventually be made thread safe. The biggest change is that each semaphore's state now keeps a queue of pending operations, which removes the need to frequently walk all pending submissions in every queue. Fixes KhronosGroup#3590, which was introduced when one the 'for every queue' loops was removed in a previous commit. This change also adds checking for: VUID-VkAcquireNextImageInfoKHR-semaphore-01781 VUID-vkAcquireNextImageKHR-semaphore-01779
This test is a reproduction case for Issue KhronosGroup#3590
Make all non-const member data private and restructure semaphore tracking to allow it to eventually be made thread safe. The biggest change is that each semaphore's state now keeps a queue of pending operations, which removes the need to frequently walk all pending submissions in every queue. Fixes KhronosGroup#3590, which was introduced when one the 'for every queue' loops was removed in a previous commit. This change also adds checking for: VUID-VkAcquireNextImageInfoKHR-semaphore-01781 VUID-vkAcquireNextImageKHR-semaphore-01779
This test is a reproduction case for Issue KhronosGroup#3590
Make all non-const member data private and restructure semaphore tracking to allow it to eventually be made thread safe. The biggest change is that each semaphore's state now keeps a queue of pending operations, which removes the need to frequently walk all pending submissions in every queue. Fixes KhronosGroup#3590, which was introduced when one the 'for every queue' loops was removed in a previous commit. This change also adds checking for: VUID-VkAcquireNextImageInfoKHR-semaphore-01781 VUID-vkAcquireNextImageKHR-semaphore-01779
This test is a reproduction case for Issue KhronosGroup#3590
Make all non-const member data private and restructure semaphore tracking to allow it to eventually be made thread safe. The biggest change is that each semaphore's state now keeps a queue of pending operations, which removes the need to frequently walk all pending submissions in every queue. Fixes KhronosGroup#3590, which was introduced when one the 'for every queue' loops was removed in a previous commit. This change also adds checking for: VUID-VkAcquireNextImageInfoKHR-semaphore-01781 VUID-vkAcquireNextImageKHR-semaphore-01779
This test is a reproduction case for Issue KhronosGroup#3590
Make all non-const member data private and restructure semaphore tracking to allow it to eventually be made thread safe. The biggest change is that each semaphore's state now keeps a queue of pending operations, which removes the need to frequently walk all pending submissions in every queue. Fixes KhronosGroup#3590, which was introduced when one the 'for every queue' loops was removed in a previous commit. This change also adds checking for: VUID-VkAcquireNextImageInfoKHR-semaphore-01781 VUID-vkAcquireNextImageKHR-semaphore-01779
This test is a reproduction case for Issue KhronosGroup#3590
Make all non-const member data private and restructure semaphore tracking to allow it to eventually be made thread safe. The biggest change is that each semaphore's state now keeps a queue of pending operations, which removes the need to frequently walk all pending submissions in every queue. Fixes KhronosGroup#3590, which was introduced when one the 'for every queue' loops was removed in a previous commit.
This test is a reproduction case for Issue KhronosGroup#3590
Make all non-const member data private and restructure semaphore tracking to allow it to eventually be made thread safe. The biggest change is that each semaphore's state now keeps a queue of pending operations, which removes the need to frequently walk all pending submissions in every queue. Fixes KhronosGroup#3590, which was introduced when one the 'for every queue' loops was removed in a previous commit.
This test is a reproduction case for Issue KhronosGroup#3590
Make all non-const member data private and restructure semaphore tracking to allow it to eventually be made thread safe. The biggest change is that each semaphore's state now keeps a queue of pending operations, which removes the need to frequently walk all pending submissions in every queue. Fixes KhronosGroup#3590, which was introduced when one the 'for every queue' loops was removed in a previous commit.
This test is a reproduction case for Issue KhronosGroup#3590
Make all non-const member data private and restructure semaphore tracking to allow it to eventually be made thread safe. The biggest change is that each semaphore's state now keeps a queue of pending operations, which removes the need to frequently walk all pending submissions in every queue. Fixes #3590, which was introduced when one the 'for every queue' loops was removed in a previous commit.
This test is a reproduction case for Issue #3590
Describe the Issue
I've updated my local SDK from 1.2.189.2 to 1.2.198.1 and started getting errors like "resource still in use" where I did not get them before. I did some investigation and I think it's related to the recent refactoring of Queue submission state tracking
@jeremy-lunarg. EDIT: I pinged the wrong Jeremy, sorry! @jeremyg-lunargThe current version seems to be retiring work on the specific queue that is going to signal this semaphore (not 100% sure)
Vulkan-ValidationLayers/layers/queue_state.cpp
Line 161 in 5764298
Whereas the previous version, before it was refactored, iterated all the queues to check what work can be retired as of this semaphore:
Vulkan-ValidationLayers/layers/queue_state.cpp
Line 224 in 7801b41
With this observation in mind, I modified my app to never reuse the same Timeline Semaphore for submissions across different queue families. It normally does a dependency analysis to pack submissions onto the smallest number of timeline semaphores (unique paths through a DAG).
This resolved my issue and I haven't seen it since. I don't have a simple C++ repro, but I believe this issue arises when:
vkWaitSemaphores
is used on the host to wait for the first submission to completeI'm not sure if the validation layers end up retiring the second submission on queue B correctly. The errors I saw implied they haven't retired the first one.
Interestingly, I could not reproduce this with multiple queues of the same family. My graph analysis also finds independent compute work and submits them to different queues within the same family, but that doesn't seem to cause an issue.
Valid Usage ID
All the ones for "resource still in use" could probably get this. I've personally seen errors from vkResetCommandPool and vkDestroyAccelerationStructure:
VUID-vkResetCommandPool-commandPool-00040
Environment:
Additional context
The text was updated successfully, but these errors were encountered: