Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Queue[Id(1,2,d3d12)] does not exist internal panic during Queue::write_buffer() on DX12 backend #5092

Closed
kpreid opened this issue Jan 18, 2024 · 8 comments · Fixed by #6070
Closed
Assignees
Labels
type: bug Something isn't working

Comments

@kpreid
Copy link
Contributor

kpreid commented Jan 18, 2024

Description
I observed the following panic as an intermittent test failure in my own project's CI when attempting to update to wgpu 0.19. The tests share one Adapter between test threads, but each test uses its own Device and Queue. I don't know if the failure is Windows-specific as it has only happened once so far.

thread 'tests::scale_to_integer_step_test' panicked at ...\wgpu-core-0.19.0\src\storage.rs:113:39:
Queue[Id(1,2,d3d12)] does not exist
stack backtrace:
...
    2: wgpu_core::storage::Storage<wgpu_core::pipeline::ShaderModule<wgpu_hal::gles::Api>,wgpu_core::id::Id<wgpu_core::pipeline::ShaderModule<wgpu_hal::empty::Api> > >::get<wgpu_core::pipeline::ShaderModule<wgpu_hal::gles::Api>,wgpu_core::id::Id<wgpu_core::pipel
              at ...\wgpu-core-0.19.0\src\storage.rs:113
    3: wgpu_core::registry::Registry<wgpu_core::id::Id<wgpu_core::pipeline::ComputePipeline<wgpu_hal::empty::Api> >,wgpu_core::pipeline::ComputePipeline<wgpu_hal::dx12::Api> >::get<wgpu_core::id::Id<wgpu_core::pipeline::ComputePipeline<wgpu_hal::empty::Api> >,wg
              at ...\wgpu-core-0.19.0\src\registry.rs:136
    4: wgpu_core::global::Global<wgpu_core::identity::IdentityManagerFactory>::queue_write_buffer<wgpu_core::identity::IdentityManagerFactory,wgpu_hal::dx12::Api>
              at ...\wgpu-core-0.19.0\src\device\queue.rs:377
    5: wgpu::backend::wgpu_core::impl$7::queue_write_buffer
              at ...\wgpu-0.19.0\src\backend\wgpu_core.rs:2092
    6: wgpu::context::impl$5::queue_write_buffer<wgpu::backend::wgpu_core::ContextWgpuCore>
              at ...\wgpu-0.19.0\src\context.rs:2911
    7: wgpu::Queue::write_buffer
              at ...\wgpu-0.19.0\src\lib.rs:4582
...

Repro
No standalone repro yet.

Platform
wgpu: 0.19.0

GitHub Actions WIndows runner:

Current runner version: '2.311.0'
Operating System
  Microsoft Windows Server [2](https://github.com/kpreid/all-is-cubes/actions/runs/7565119937/job/20600346191?pr=449#step:1:2)022
  10.0.20[3](https://github.com/kpreid/all-is-cubes/actions/runs/7565119937/job/20600346191?pr=449#step:1:3)[4](https://github.com/kpreid/all-is-cubes/actions/runs/7565119937/job/20600346191?pr=449#step:1:4)8
  Datacenter
Runner Image
  Image: windows-2022
  Version: 2024010[8](https://github.com/kpreid/all-is-cubes/actions/runs/7565119937/job/20600346191?pr=449#step:1:9).1.0
  Included Software: https://github.com/actions/runner-images/blob/win22/20240108.1/images/windows/Windows2022-Readme.md
  Image Release: https://github.com/actions/runner-images/releases/tag/win22%2F20240[10](https://github.com/kpreid/all-is-cubes/actions/runs/7565119937/job/20600346191?pr=449#step:1:11)8.1
@AdrianEddy
Copy link
Contributor

I've also seen this panic (I believe it started after arcaization has landed), but I didn't investigate the cause and I don't have a clear case for reproduction, it seems to happen randomly for me

@cwfitzgerald
Copy link
Member

If this can be reproduced with trace logging active, we might be able to get to the bottom of it.

Unfortunately, this may also throw off any timing component

@AdrianEddy
Copy link
Contributor

I have something like that:

21:11:24 [ERROR] Uncaptured device error: Validation { source: ContextError { string: "Queue::write_buffer", cause: Queue(InvalidQueueId), label_key: "", label: "" }, description: "Validation Error\n\nCaused by:\n    In Queue::write_buffer\n    QueueId is invalid\n" }
21:11:24 [ERROR] Uncaptured device error: Validation { source: ContextError { string: "Queue::write_buffer", cause: Queue(InvalidQueueId), label_key: "", label: "" }, description: "Validation Error\n\nCaused by:\n    In Queue::write_buffer\n    QueueId is invalid\n" }
21:11:24 [ERROR] Uncaptured device error: Validation { source: ContextError { string: "Queue::write_buffer", cause: Queue(InvalidQueueId), label_key: "", label: "" }, description: "Validation Error\n\nCaused by:\n    In Queue::write_buffer\n    QueueId is invalid\n" }
21:11:25 [ERROR] thread '<unnamed>' panicked at 'Error in Queue::submit: Validation Error

Caused by:
    QueueId is invalid
': C:\Users\Eddy\.cargo\git\checkouts\wgpu-53e70f8674b08dd4\adf1e3b\wgpu\src\backend\wgpu_core.rs:2228
   0: backtrace::backtrace::trace
   1: backtrace::capture::Backtrace::new
   2: <backtrace::capture::Backtrace as core::default::Default>::default
   3: <log_panics::Shim as core::fmt::Debug>::fmt
   4: alloc::boxed::impl$49::call
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library\alloc\src\boxed.rs:2021
   5: std::panicking::rust_panic_with_hook
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library\std\src\panicking.rs:783
   6: std::panicking::begin_panic_handler::closure$0
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library\std\src\panicking.rs:657
   7: std::sys_common::backtrace::__rust_end_short_backtrace<std::panicking::begin_panic_handler::closure_env$0,never$>
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library\std\src\sys_common\backtrace.rs:170
   8: std::panicking::begin_panic_handler
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library\std\src\panicking.rs:645
   9: core::panicking::panic_fmt
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library\core\src\panicking.rs:72
  10: <wgpu::backend::wgpu_core::ContextWgpuCore as core::fmt::Debug>::fmt
  11: <T as wgpu::context::DynContext>::queue_submit

trace.zip

hope this helps

@cwfitzgerald
Copy link
Member

Oh that's the wrong kind of trace, sorry :)

I meant trace level logging - either through RUST_LOG=trace if you're using env_logger, or however you configured it

@AdrianEddy
Copy link
Contributor

Ok I have one
log.zip

@dtzxporter
Copy link
Contributor

This log looks like it can happen on vk too, probably other backends as well.

@kpreid
Copy link
Contributor Author

kpreid commented Feb 7, 2024

This log looks like it can happen on vk too, probably other backends as well.

Yep, just had a macOS flake in the same test case I originally posted:

  thread 'tests::scale_to_integer_step_test' panicked at /Users/runner/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-core-0.19.0/src/storage.rs:113:39:
  Queue[Id(0,2,mtl)] does not exist
  stack backtrace:
...
     2: wgpu_core::storage::Storage<T,I>::get
               at /Users/runner/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-core-0.19.0/src/storage.rs:113:39
     3: wgpu_core::storage::Storage<T,I>::get_owned
               at /Users/runner/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-core-0.19.0/src/storage.rs:128:23
     4: wgpu_core::registry::Registry<I,T>::get
               at /Users/runner/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-core-0.19.0/src/registry.rs:136:9
     5: wgpu_core::device::queue::<impl wgpu_core::global::Global<G>>::queue_write_buffer
               at /Users/runner/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-core-0.19.0/src/device/queue.rs:377:21
     6: <wgpu::backend::wgpu_core::ContextWgpuCore as wgpu::context::Context>::queue_write_buffer
               at /Users/runner/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-0.19.1/src/backend/wgpu_core.rs:2092:15
     7: <T as wgpu::context::DynContext>::queue_write_buffer
               at /Users/runner/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-0.19.1/src/context.rs:2911:9
     8: wgpu::Queue::write_buffer
               at /Users/runner/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-0.19.1/src/lib.rs:4582:9
...

@teoxoy teoxoy self-assigned this Aug 2, 2024
@teoxoy teoxoy added the type: bug Something isn't working label Aug 2, 2024
@teoxoy
Copy link
Member

teoxoy commented Aug 2, 2024

The problem stems from the device and queue IDs not being the same but we assume they are in multiple places throughout the codebase.

22:37:20 [TRACE] (38) wgpu_core::storage: [C:\Users\Eddy\.cargo\git\checkouts\wgpu-53e70f8674b08dd4\adf1e3b\wgpu-core\src\storage.rs:165] User is inserting DeviceId(0,1,vk)
22:37:20 [TRACE] (38) wgpu_core::instance: [C:\Users\Eddy\.cargo\git\checkouts\wgpu-53e70f8674b08dd4\adf1e3b\wgpu-core\src\instance.rs:1089] Created Device Id(0,1,vk)
22:37:20 [TRACE] (38) wgpu_core::storage: [C:\Users\Eddy\.cargo\git\checkouts\wgpu-53e70f8674b08dd4\adf1e3b\wgpu-core\src\storage.rs:165] User is inserting QueueId(0,1,vk)
22:37:20 [TRACE] (38) wgpu_core::instance: [C:\Users\Eddy\.cargo\git\checkouts\wgpu-53e70f8674b08dd4\adf1e3b\wgpu-core\src\instance.rs:1095] Created Queue Id(0,1,vk)
...
22:37:20 [TRACE] (1) wgpu_core::device::global: [C:\Users\Eddy\.cargo\git\checkouts\wgpu-53e70f8674b08dd4\adf1e3b\wgpu-core\src\device\global.rs:2303] Queue::drop Id(0,1,vk)
22:37:20 [TRACE] (1) wgpu_core::storage: [C:\Users\Eddy\.cargo\git\checkouts\wgpu-53e70f8674b08dd4\adf1e3b\wgpu-core\src\storage.rs:202] User is removing QueueId(0,1,vk)
22:37:20 [TRACE] (1) wgpu_core::device::global: [C:\Users\Eddy\.cargo\git\checkouts\wgpu-53e70f8674b08dd4\adf1e3b\wgpu-core\src\device\global.rs:2226] Device::drop Id(0,1,vk)
22:37:20 [TRACE] (1) wgpu_core::storage: [C:\Users\Eddy\.cargo\git\checkouts\wgpu-53e70f8674b08dd4\adf1e3b\wgpu-core\src\storage.rs:202] User is removing DeviceId(0,1,vk)
...
22:37:30 [TRACE] (38) wgpu_core::storage: [C:\Users\Eddy\.cargo\git\checkouts\wgpu-53e70f8674b08dd4\adf1e3b\wgpu-core\src\storage.rs:165] User is inserting DeviceId(0,1,d3d12)
22:37:30 [TRACE] (38) wgpu_core::instance: [C:\Users\Eddy\.cargo\git\checkouts\wgpu-53e70f8674b08dd4\adf1e3b\wgpu-core\src\instance.rs:1089] Created Device Id(0,1,d3d12)
22:37:30 [TRACE] (38) wgpu_core::storage: [C:\Users\Eddy\.cargo\git\checkouts\wgpu-53e70f8674b08dd4\adf1e3b\wgpu-core\src\storage.rs:165] User is inserting QueueId(0,1,d3d12)
22:37:30 [TRACE] (38) wgpu_core::instance: [C:\Users\Eddy\.cargo\git\checkouts\wgpu-53e70f8674b08dd4\adf1e3b\wgpu-core\src\instance.rs:1095] Created Queue Id(0,1,d3d12)
...
22:37:31 [TRACE] (38) wgpu_core::device::global: [C:\Users\Eddy\.cargo\git\checkouts\wgpu-53e70f8674b08dd4\adf1e3b\wgpu-core\src\device\global.rs:2303] Queue::drop Id(0,1,d3d12)
22:37:31 [TRACE] (38) wgpu_core::storage: [C:\Users\Eddy\.cargo\git\checkouts\wgpu-53e70f8674b08dd4\adf1e3b\wgpu-core\src\storage.rs:202] User is removing QueueId(0,1,d3d12)
22:37:31 [TRACE] (38) wgpu_core::device::global: [C:\Users\Eddy\.cargo\git\checkouts\wgpu-53e70f8674b08dd4\adf1e3b\wgpu-core\src\device\global.rs:2226] Device::drop Id(0,1,d3d12)
22:37:31 [TRACE] (38) wgpu_core::storage: [C:\Users\Eddy\.cargo\git\checkouts\wgpu-53e70f8674b08dd4\adf1e3b\wgpu-core\src\storage.rs:202] User is removing DeviceId(0,1,d3d12)
...
22:37:31 [TRACE] (38) wgpu_core::storage: [C:\Users\Eddy\.cargo\git\checkouts\wgpu-53e70f8674b08dd4\adf1e3b\wgpu-core\src\storage.rs:165] User is inserting DeviceId(1,1,vk)
22:37:31 [TRACE] (38) wgpu_core::instance: [C:\Users\Eddy\.cargo\git\checkouts\wgpu-53e70f8674b08dd4\adf1e3b\wgpu-core\src\instance.rs:1089] Created Device Id(1,1,vk)
22:37:31 [TRACE] (38) wgpu_core::storage: [C:\Users\Eddy\.cargo\git\checkouts\wgpu-53e70f8674b08dd4\adf1e3b\wgpu-core\src\storage.rs:165] User is inserting QueueId(0,2,vk)
22:37:31 [TRACE] (38) wgpu_core::instance: [C:\Users\Eddy\.cargo\git\checkouts\wgpu-53e70f8674b08dd4\adf1e3b\wgpu-core\src\instance.rs:1095] Created Queue Id(0,2,vk)

ready(Ok((device_id, device, device_id, queue)))

We still do this on trunk:

ready(Ok((device_id, device, device_id.into_queue_id(), queue)))

wgpu/wgpu-core/src/id.rs

Lines 329 to 333 in 9c6ae1b

impl DeviceId {
pub fn into_queue_id(self) -> QueueId {
Id(self.0, PhantomData)
}
}

Given that devices and queues can have different lifetimes from the user's perspective we should fully decouple their IDs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug Something isn't working
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

5 participants