device.create_compute_pipeline hangs #2529

peters-david · 2022-03-08T16:58:19Z

I want to run a compute shader. Everything until and including device.create_shader_module runs without problems, no validation errors.
The next step is to call device.create_compute_pipeline which hangs. From system monitor i see that this call uses around 3 GB of RAM.

I created a repo where you can reproduce this issue: Issue repo

I probably just misunderstood something about wgsl but i am not sure how to find the problem since naga doesn't give me any errors.

Is this related to the compute shader being rather big? Could you give me some pointers how to find the problem?

Tried this on different systems:
Linux Ubuntu / Mesa Intel Iris Plus Graphics (ICL GT2)
Linux Ubuntu / NVIDIA Quadro M4000
Windows 10 / NVIDIA Quadro M4000

The text was updated successfully, but these errors were encountered:

kvark · 2022-03-09T04:01:00Z

So it hands on 3 different systems, technically? That's definitely unexpected!

peters-david · 2022-03-09T06:49:37Z

Yes, also tested on Ubuntu / Intel HD Graphics 3000 (SNB GT2) with the same result.

JasonS05 · 2023-05-17T23:32:32Z

Any progress on this? I tried running the WebGPU samples on the latest Firefox Nightly a few days ago and all samples with computer shaders seemed to suffer the same issue. No WebGPU graphics displayed (even the portions not requiring compute shaders, if there were any), and the RAM usage was anomalously high. 3 GB sounds about right. I would estimate maybe 4 GB from my memory of what the RAM graph looked like, but that was a few days ago. The system was a laptop with Ubuntu 22.04 and an NVIDIA GPU and an Intel CPU. I think it was probably an NVIDIA RTX 30xx series card, but I don't know exactly which one. The whole computer was only $800 dollars (2020) with 8 GB RAM and 256 GB SSD, so nothing too high-end.

Edit: turns out if was a GTX 1650

JasonS05 · 2023-05-17T23:48:15Z

I tested just now on an iMac running Mojave (1.14.6) with a "4 GHz Intel Core i7" and "AMD Radeon R9 M295X 4 GB" and when the WebGPU samples page was open the memory usage of Firefox Nightly rose steadily but slowly without apparent limit. At 9 GB I switched to a different tab and memory usage stopped rising. Then I switched back and it resumed rising. Closing the WebGPU tab instantly dropped the memory usage to only a few hundred MB. The memory leak began when I opened the Cornell Box sample (that one specifically, other compute shader ones didn't do it) and kept leaking even when I switched to other samples, even the simplest one. Only closing the tab cured the leak. This particular sample also gave an error regarding usage of an unsupported texture format (I think "bgra8unorm" or something). Also, in order to get any of the samples to run, I had to enabled the "gfx.webgpu.ignore-blocklist" setting in about:config and restart the browser.

ErichDonGubler · 2023-05-18T04:45:24Z

Does this issue also happen in Google Chrome?

JasonS05 · 2023-05-18T06:00:06Z

In the WebGPU samples website all samples worked on Chrome on my iMac except the Cornell Box. So the compute capabilities function fine. As for the Cornell Box memory leak, I'll test that tomorrow on Chrome.

JasonS05 · 2023-05-18T21:39:59Z

Ok that's strange. Today on Chrome WebGPU isn't working at all on my iMac. I'm just getting TypeError: Cannot read properties of null (reading 'requestDevice'). I even tried enabling WebGPU developer features and no luck. Chrome version is 113.0.5672.92. But I know it definitely, positively worked a few days ago. Either that or I'm seriously hallucinating.

teoxoy · 2023-06-05T10:18:27Z

@JasonS05 the issues you are facing might not be related to this bug report. Please try to reproduce this issue by trying to run the repo linked in the description or file a bug here for Firefox issues.

teoxoy · 2023-06-05T11:00:49Z

I'm hitting #4393 while trying to run this on the DX12 backend.
On Vulkan, it doesn't hang but takes minutes for the pipeline to be created.

@peters-david was the call to create_compute_pipeline just slow or was it really hanging (never completing)?

peters-david · 2023-06-05T11:49:13Z

@teoxoy It never completed for me. The longest I waited was around 30 minutes. It may have completed after that but I didn't bother to wait longer.

teoxoy · 2023-06-05T12:02:23Z

Did you notice the RAM usage continuously increasing? that's what I noticed while it was creating the pipeline.

Also, this issue is one year old, do you have any new findings?

peters-david · 2023-06-05T12:06:04Z

@teoxoy Yes, ram usage increased. I didn't really work on it since then, sorry, but I can test again if it helps.

teoxoy · 2023-06-05T13:11:19Z

I see, np. If you'd be able to narrow down the slowness to a specific section of code within wgpu by profiling the test app that would be appreciated.

JasonS05 · 2023-06-06T03:05:01Z

@JasonS05 the issues you are facing might not be related to this bug report. Please try to reproduce this issue by trying to run the repo linked in the description or file a bug here for Firefox issues.

I tried compiling the linked repo just now but it had several compile errors. As I am not familiar with rust I do not know how to proceed. These are the error messages:

Click to expand

   Compiling test v0.1.0 (/home/jason/Desktop/Coding Stuff/github/peters-david.test)
error[E0308]: mismatched types
    --> src/gpu.rs:19:60
     |
19   |         let instance: wgpu::Instance = wgpu::Instance::new(wgpu::Backends::all());
     |                                        ------------------- ^^^^^^^^^^^^^^^^^^^^^ expected struct `InstanceDescriptor`, found struct `Backends`
     |                                        |
     |                                        arguments to this function are incorrect
     |
note: associated function defined here
    --> /home/jason/.cargo/git/checkouts/wgpu-53e70f8674b08dd4/8b6599b/wgpu/src/lib.rs:1343:12
     |
1343 |     pub fn new(instance_desc: InstanceDescriptor) -> Self {
     |            ^^^

error[E0308]: mismatched types
    --> src/gpu.rs:67:46
     |
67   |       let shader = device.create_shader_module(&wgpu::ShaderModuleDescriptor {
     |  _________________________--------------------_^
     | |                         |
     | |                         arguments to this function are incorrect
68   | |         label: None,
69   | |         source: wgpu::ShaderSource::Wgsl(Cow::Borrowed(include_str!("shader.wgsl"))),
70   | |     });
     | |_____^ expected struct `ShaderModuleDescriptor`, found `&ShaderModuleDescriptor<'_>`
     |
note: associated function defined here
    --> /home/jason/.cargo/git/checkouts/wgpu-53e70f8674b08dd4/8b6599b/wgpu/src/lib.rs:1948:12
     |
1948 |     pub fn create_shader_module(&self, desc: ShaderModuleDescriptor) -> ShaderModule {
     |            ^^^^^^^^^^^^^^^^^^^^
help: consider removing the borrow
     |
67   -     let shader = device.create_shader_module(&wgpu::ShaderModuleDescriptor {
67   +     let shader = device.create_shader_module(wgpu::ShaderModuleDescriptor {
     |

error[E0599]: no method named `dispatch` found for struct `ComputePass` in the current scope
   --> src/gpu.rs:130:22
    |
130 |         compute_pass.dispatch(1, 1, 1); // Number of cells to run, the (x,y,z) size of item being processed
    |                      ^^^^^^^^ method not found in `ComputePass<'_>`

error[E0061]: this function takes 2 arguments but 1 argument was supplied
    --> src/gpu.rs:142:54
     |
142  |     let cpu_buffer_out_future = cpu_buffer_out_slice.map_async(wgpu::MapMode::Read);
     |                                                      ^^^^^^^^^--------------------- an argument is missing
     |
note: associated function defined here
    --> /home/jason/.cargo/git/checkouts/wgpu-53e70f8674b08dd4/8b6599b/wgpu/src/lib.rs:2547:12
     |
2547 |     pub fn map_async(
     |            ^^^^^^^^^
help: provide the argument
     |
142  |     let cpu_buffer_out_future = cpu_buffer_out_slice.map_async(wgpu::MapMode::Read, /* value */);
     |                                                               ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

error[E0277]: `()` is not a future
   --> src/gpu.rs:146:42
    |
146 |     if let Ok(()) = cpu_buffer_out_future.await {
    |                                          ^^^^^^
    |                                          |
    |                                          `()` is not a future
    |                                          help: remove the `.await`
    |
    = help: the trait `Future` is not implemented for `()`
    = note: () must be a future or must implement `IntoFuture` to be awaited
    = note: required for `()` to implement `IntoFuture`

Some errors have detailed explanations: E0061, E0277, E0308, E0599.
For more information about an error, try `rustc --explain E0061`.
error: could not compile `test` due to 5 previous errors

As for filing a bug report with Bugzilla, I do not have an account there so I won't be posting a bug report there for the moment.

teoxoy · 2023-06-06T13:02:46Z

Add rev = "0ac9ce002656565ccd05b889f5856f4e2c38fa73" (it was the latest commit on the day the bug was filed) to the wgpu entry in Cargo.toml.

As for filing a bug report with Bugzilla, I do not have an account there so I won't be posting a bug report there for the moment.

Logging in via github should work - but up to you.

JasonS05 · 2023-06-08T02:53:14Z

Unfortunately, that made a new error. Something about no suitable version of web-sys. Full error here:

Click to expand

    Updating git repository `https://github.com/gfx-rs/wgpu`
    Updating crates.io index
    Updating git repository `https://github.com/gfx-rs/naga`
    Updating git repository `https://github.com/gfx-rs/metal-rs`
error: failed to select a version for `web-sys`.
    ... required by package `wgpu v0.12.0 (https://github.com/gfx-rs/wgpu?rev=0ac9ce002656565ccd05b889f5856f4e2c38fa73#0ac9ce00)`
    ... which satisfies git dependency `wgpu` of package `test v0.1.0 (/home/jason/Desktop/Coding Stuff/github/peters-david.test)`
versions that meet the requirements `^0.3.53` (locked to 0.3.63) are: 0.3.63

the package `wgpu` depends on `web-sys`, with features: `GpuBufferUsage` but `web-sys` does not have these features.


failed to select a version for `web-sys` which could resolve this conflict

teoxoy · 2023-06-08T12:38:25Z

Run cargo clean and delete the Cargo.lock file (at least that's what I did).

JasonS05 · 2023-06-08T21:04:35Z

Ok, it works now. When I ran the program it seemed to hang at the described spot with a total system memory usage hovering around 5.8 GiB. After a couple minutes it finished whatever it was doing and the program exited normally leaving the system memory at 3.9 GiB. Running the program again does not reproduce the hang and the whole thing executes in under a second.

This is with my Ubuntu 22.04.2 LTS, GTX 1650 system

chancehudson · 2023-11-18T12:11:44Z

I ran into this issue on a Macbook Air with an M1 processor. I found the cause to be a large multi-dimensional array in the workgroup memory space. I made a minimal repro case here: https://github.com/vimwitch/webgpu-hang-repro

Some things I noticed during testing:

Problem does not occur for storage memory
Problem occurs with single dimensional arrays
After waiting for the pipeline to be created once, subsequent creations do not hang. Changing the size of the array causes the next creation to hang. Reverting to the previous value after waiting for the changed size to be created does not result in another hang.
Changing shader logic causes the hang to occur again
Changing workgroup size does not cause hang to occur again
If the shader logic does not touch the array the hang does not occur
During the hang system memory and CPU use is unaffected
During the hang the program CPU use is 0, memory use is constant at 3.9 MB

Apple M1 Macbook Air OSX 12.5

chancehudson · 2023-11-18T12:40:06Z

I profiled the repro above: https://share.firefox.dev/3G3Al3W

Forpee · 2023-12-15T14:52:06Z

Is there any progress on this issue? I'm encountering a similar problem where my program stalls on the device.create_compute_pipeline line.

To me, it looks like the compute pipeline pre-runs the shader on the first look, which causes this long stall before it completes. As I noticed the more time-intensive functions I ran in my main function the longer the pipeline took to complete

jimblandy · 2024-05-20T05:26:07Z

Assigning Teo to try to reproduce, investigate cause, and estimate size.

kvark added area: correctness We're behaving incorrectly type: bug Something isn't working labels Mar 9, 2022

teoxoy added this to the WebGPU Specification V1 milestone Dec 5, 2022

teoxoy added the backend: vulkan Issues with Vulkan label Jun 5, 2023

kugimasa mentioned this issue Aug 29, 2023

Actions build starts failing after 37f075 kugimasa/WebGPUTracer#1

Open

teoxoy added the backend: metal Issues with Metal label Nov 20, 2023

teoxoy added this to WebGPU for Firefox Jan 18, 2024

jimblandy assigned teoxoy May 20, 2024

teoxoy removed their assignment Jul 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

device.create_compute_pipeline hangs #2529

device.create_compute_pipeline hangs #2529

peters-david commented Mar 8, 2022

kvark commented Mar 9, 2022

peters-david commented Mar 9, 2022

JasonS05 commented May 17, 2023 •

edited

Loading

JasonS05 commented May 17, 2023

ErichDonGubler commented May 18, 2023

JasonS05 commented May 18, 2023

JasonS05 commented May 18, 2023

teoxoy commented Jun 5, 2023

teoxoy commented Jun 5, 2023

peters-david commented Jun 5, 2023

teoxoy commented Jun 5, 2023

peters-david commented Jun 5, 2023

teoxoy commented Jun 5, 2023

JasonS05 commented Jun 6, 2023

teoxoy commented Jun 6, 2023

JasonS05 commented Jun 8, 2023

teoxoy commented Jun 8, 2023

JasonS05 commented Jun 8, 2023 •

edited

Loading

chancehudson commented Nov 18, 2023 •

edited

Loading

chancehudson commented Nov 18, 2023

Forpee commented Dec 15, 2023 •

edited

Loading

jimblandy commented May 20, 2024

device.create_compute_pipeline hangs #2529

device.create_compute_pipeline hangs #2529

Comments

peters-david commented Mar 8, 2022

kvark commented Mar 9, 2022

peters-david commented Mar 9, 2022

JasonS05 commented May 17, 2023 • edited Loading

JasonS05 commented May 17, 2023

ErichDonGubler commented May 18, 2023

JasonS05 commented May 18, 2023

JasonS05 commented May 18, 2023

teoxoy commented Jun 5, 2023

teoxoy commented Jun 5, 2023

peters-david commented Jun 5, 2023

teoxoy commented Jun 5, 2023

peters-david commented Jun 5, 2023

teoxoy commented Jun 5, 2023

JasonS05 commented Jun 6, 2023

teoxoy commented Jun 6, 2023

JasonS05 commented Jun 8, 2023

teoxoy commented Jun 8, 2023

JasonS05 commented Jun 8, 2023 • edited Loading

chancehudson commented Nov 18, 2023 • edited Loading

chancehudson commented Nov 18, 2023

Forpee commented Dec 15, 2023 • edited Loading

jimblandy commented May 20, 2024

JasonS05 commented May 17, 2023 •

edited

Loading

JasonS05 commented Jun 8, 2023 •

edited

Loading

chancehudson commented Nov 18, 2023 •

edited

Loading

Forpee commented Dec 15, 2023 •

edited

Loading