Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vulkan Validation Error: Cannot free VkBuffer that is in use by a command buffer. #1689

Closed
Imberflur opened this issue Jul 19, 2021 · 22 comments
Assignees
Labels
api: vulkan Issues with Vulkan type: bug Something isn't working

Comments

@Imberflur
Copy link
Contributor

Description
Vulkan validation error:

ERROR gfx_backend_vulkan: 
VALIDATION [VUID-vkDestroyBuffer-buffer-00922 (0xe4549c11)] : Validation Error: [ VUID-vkDestroyBuffer-buffer-00922 ] Object 0: handle = 0x4ce96a00000014e6, type = VK_OBJECT_TYPE_BUFFER; | MessageID = 0xe4549c11 | Cannot free VkBuffer 0x4ce96a00000014e6[] that is in use by a command buffer. The Vulkan spec states: All submitted commands that refer to buffer, either directly or via a VkBufferView, must have completed execution (https://www.khronos.org/registry/vulkan/specs/1.2-extensions/html/vkspec.html#VUID-vkDestroyBuffer-buffer-00922)
object info: (type: BUFFER, hndl: 0x4ce96a00000014e6)

I think this occurs in veloren when a switch between scenes is initiated that is quickly interrupted. Since I get it after getting kicked to the character selection screen by an error from the server.

Repro steps
Attached API trace

Expected vs observed behavior
No validation errors

Extra materials
wgpu-trace.zip
wgpu-trace.z01.zip
wgpu-trace.z02.zip
should be able to extract these by removing the .zip from the last two and running unzip wpgu-trace.zip

Platform

OS: Manjaro 21.1.0 Pahvo
DE: Xfce4
GPU: AMD Radeon HD 7900 Series (TAHITI, DRM 3.40.0, 5.10.49-1-MANJARO, LLVM 12.0.0)
wgpu commit: a92b8549a8e2cb9dac781bafc5ed32828f3caf46
@cwfitzgerald cwfitzgerald added the type: bug Something isn't working label Jul 19, 2021
@kvark
Copy link
Member

kvark commented Jul 20, 2021

I'm following your instructions, but unzip complains loudly and fails to properly unpack them. Maybe try https://wormhole.app/ upload?

@Imberflur
Copy link
Contributor Author

@kvark thanks for the tip, here is the link (it should last 24hr let me know if another upload is needed)
https://wormhole.app/d9kpY#oEaxV-v-Moh-FppzVmSREA

@kvark
Copy link
Member

kvark commented Jul 26, 2021

Strangely, I downloaded this "wgpu-trace-whole.zip", unpacked it, and it's still incomplete. The data indices start at 2500 or so. Not sure what's going on.

@kvark kvark added this to the Release v0.10 milestone Jul 28, 2021
@Imberflur
Copy link
Contributor Author

Hmm, I must have not put it back together correctly. I will try to get the original or re-create it.

@Imberflur
Copy link
Contributor Author

Hopefully this one works https://wormhole.app/aRbrY#f0nTkJk-1F7iDug3XnTYUA (sorry for the issues)

@Imberflur
Copy link
Contributor Author

I am getting this issue when running some of examples as tests (e.g. boids, water). The test failing might be necessary to trigger it since I get e.g.:

thread 'main' panicked at 'Image data mismatch! Outlier count 2359296 over limit 460. Max difference 255', wgpu/examples/water/../../tests/common/image.rs:134:13
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
thread 'main' panicked at 'UNEXPECTED TEST FAILURE', wgpu/examples/water/../../tests/common/mod.rs:301:9

However, the vulkan validation errors appear before this and I don't know how to make the test pass to see whether it still triggers validation errors. I don't see them when running the example normally though. Also, it might be significant to note that when this occurs the test hangs and doesn't exit.

@nickkuk
Copy link

nickkuk commented Aug 17, 2021

I don't use wgpu, just pure ash and have the same validation error in the case where I'm sure that I've waited for the timeline semaphore in the right way. So it can be just noisy incorrect validation warning.

@kvark kvark removed this from the Release v0.10 milestone Aug 17, 2021
@Imberflur
Copy link
Contributor Author

I think I figured out how to get the test to not panic (by deleting the reference image). And it appears to still be producing vulkan validation errors and hanging.

@kvark
Copy link
Member

kvark commented Aug 18, 2021

Perhaps you could branch out the actual repro case for me to try?

@TheSpydog
Copy link

I'm getting the same validation error when running the wgpu halmark example on Windows 10. If there's any environment information that would be helpful for debugging this, let me know and I can post it.

@kvark
Copy link
Member

kvark commented Dec 29, 2021

@TheSpydog what validation layers version are you using?

@Imberflur
Copy link
Contributor Author

Perhaps you could branch out the actual repro case for me to try?

I completely missed this!

It seems like I can no longer reproduce this, I ran cargo test --example water on several branches: v0.10, v0.11, v0.12, and master. None of them produced this validation error. I can only assume a driver update has resolved it.

Strangely, I also didn't get any test failures like this one which I had before:

thread 'main' panicked at 'Image data mismatch! Outlier count 2359296 over limit 460. Max difference 255', wgpu/examples/water/../../tests/common/image.rs:134:13

Either I'm running the test differently or a driver update fixed both things.

Current gpu info:
From glinfo:

AMD Radeon HD 7900 Series (TAHITI, DRM 3.40.0, 5.10.79-1-MANJARO, LLVM 13.0.0)

From vulkaninfo:

VkPhysicalDeviceDriverProperties:
---------------------------------
	driverID           = DRIVER_ID_MESA_RADV
	driverName         = radv
	driverInfo         = Mesa 21.2.5
	conformanceVersion = 1.2.3.0
VkPhysicalDeviceProperties:
---------------------------
	apiVersion        = 4202678 (1.2.182)
	driverVersion     = 88088581 (0x5402005)
VK_LAYER_KHRONOS_validation (Khronos Validation Layer) Vulkan version 1.2.199,

@TheSpydog
Copy link

@TheSpydog what validation layers version are you using?

VK_LAYER_KHRONOS_validation (Khronos Validation Layer) Vulkan version 1.2.198

@kvark
Copy link
Member

kvark commented Jan 3, 2022

Thanks! That's quite fresh.
It would be useful to know what buffer is being reported. Could you confirm that this is just one of the buffers created on your side (as opposed to us creating it internally)? If you provide "label" to the buffer descriptor, the validation layers should pick it up when reporting an error.

@TheSpydog
Copy link

The problematic buffer is explicitly created as part of the halmark example. It's called "stage".

Validation Error: [ VUID-vkDestroyBuffer-buffer-00922 ] Object 0: handle = 0x3a6cbb0000000025, name = stage, type = VK_OBJECT_TYPE_BUFFER; | MessageID = 0xe4549c11 | Cannot free VkBuffer 0x3a6cbb0000000025[stage] that is in use by a command buffer. The Vulkan spec states: All submitted commands that refer to buffer, either directly or via a VkBufferView, must have completed execution (https://vulkan.lunarg.com/doc/view/1.2.198.1/windows/1.2-extensions/vkspec.html#VUID-vkDestroyBuffer-buffer-00922)
[2022-01-03T23:23:05Z ERROR wgpu_hal::vulkan::instance]         objects: (type: BUFFER, hndl: 0x3a6cbb0000000025, name: stage)
[2022-01-03T23:23:05Z ERROR wgpu_hal::vulkan::instance] VALIDATION [VUID-vkResetCommandPool-commandPool-00040 (0xb53e2331)]

@kvark
Copy link
Member

kvark commented Jan 3, 2022

Hmm. Reviewing the halmark example code, everything seems to be in place:

  • staging_buffer is only used in cmd_encoder
  • it produces init_cmd, which is submitted with fence value of init_fence_value
  • device waits for init_fence_value on the same fence, indefinitely
  • then the buffer gets destroyed

@kvark
Copy link
Member

kvark commented Jan 3, 2022

@TheSpydog could you upload the run log with RUST_LOG=wgpu_hal=debug please?

@TheSpydog
Copy link

Sure, here's the log: halmarklog.txt

@kvark
Copy link
Member

kvark commented Feb 2, 2022

Thank you! I was mainly interested if your platform supports timeline semaphores or not, to narrow down the problematic path.
Now that we know it's timeline semaphores, I looked at our logic again and wasn't able to find any issues. It's very straightforward.
Here are some things to play with if you have time:

  1. In device.wait(&fence, init_fence_value, !0).unwrap();, check the returned value, it should be Ok(true)
  2. Try passing the last parameter as 10 instead of !0, just in case the driver gets confused by our unusual value (we multiply it by 1M before passing to Vulkan)
  3. Try doing cmd_encoder.reset_all(iter::once(init_cmd)); before device.destroy_buffer(staging_buffer);

None of these experiments should be needed, but perhaps we'll find something interesting.

@Imberflur
Copy link
Contributor Author

It seems like I can actually still reproduce this for my original case (but not in the examples). It only occurs in a very specific scenario so I had not noticed before. I will need to find some time to see if I can test this with an updated version of wgpu.

Patryk27 pushed a commit to Patryk27/wgpu that referenced this issue Nov 23, 2022
…rs#1689)

* Allow vecN<i32> and vecN<u32> in `dot()`, first changes

* Added a test case

* Fix the test

* Changes to baking of expressions, incl args of integer dot product

* Implemented requested changes for glsl backend

* Added support for integer dot product on MSL backend

* Removed outdated code for hlsl and wgls writers

* Implement in spv backend

* Commit modified outputs from running the tests

* cargo fmt

* Applied requested changes for both MSL and GLSL back

* Changes to spv back

* Committed all test output changes

* Cargo fmt

* Added a comment w.r.t. VK_KHR_shader_integer_dot_product

* Implemented requested svp change

* Minor change to test case

This is because I wanted to highlight the fact that the correct
id is used in the last sum of the integer dot product expression

* Changed function signature

since it could not fail, changed it to simply return `void`
@teoxoy teoxoy added the api: vulkan Issues with Vulkan label Feb 24, 2023
@teoxoy
Copy link
Member

teoxoy commented Jul 17, 2024

This sounds related to #3193 (comment).

@Imberflur could you try to reproduce the issue on 61739d9 (#5910)?

@teoxoy teoxoy self-assigned this Jul 17, 2024
@teoxoy
Copy link
Member

teoxoy commented Jul 25, 2024

I think this was fixed, please reopen/open a new issue if that's not the case.

@teoxoy teoxoy closed this as completed Jul 25, 2024
@github-project-automation github-project-automation bot moved this from Todo to Done in WebGPU for Firefox Jul 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: vulkan Issues with Vulkan type: bug Something isn't working
Projects
Status: Done
Development

No branches or pull requests

6 participants