-
Notifications
You must be signed in to change notification settings - Fork 984
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vulkan hangs after a certain render sequence, and either panics, hangs or loses device #983
Comments
984: Fix locking of device lifetime tracker on resource drop r=kvark a=kvark **Connections** Fixes the player hang in #983 **Description** It turns out the current type-level protection from locking device's lifetime tracker is not working properly. TODO is left. **Testing** Tested on the trace from #983 Co-authored-by: Dzmitry Malyshau <[email protected]>
I took some time to bisect the driver version where the hang occurs on my machine: |
It looks like there's validation errors when it panics running desktop (swap chain + multiple frames). The panic happens after 3 minutes of the first frame submission.
Edit: Actually, that was when testing with #987, with commit kvark@3b76651. One commit before, kvark@3be2c45, there's no validation errors:
|
Oh interesting, thank you! I'll try leaving it running for 5 minutes, I guess :) |
986: [0.6] Allign stencil reference between state setting and pipeline creation r=cwfitzgerald a=kvark **Connections** Found this when running Ruffle from #983 **Description** There are 2 problems: 1. setting the stencil reference can't be valid if the pipeline doesn't expect this (d'oh) 2. our code that created the pipeline used slightly stricter conditions than the code setting the stencil, so these diverged a tiny bit **Testing** Tested on Ruffle Co-authored-by: Dzmitry Malyshau <[email protected]>
Is this still an issue, now that we landed the gfx-memory fix? |
We didn't use 0.2.1 for this, we were locked on 0.2.0. I just upgraded to 0.2.2 just in case but the issue still persists. I think that was a separate issue that I initially confused with this because |
What exactly are the repro steps now? Run |
Two repro steps but it looks like you need to be on windows with a geforce driver >= 456.38
|
@Dinnerbone finally got to test this on Windows/NV GTX 1050 Ti/Vulkan. It runs fine... Although my driver version is 443, and it's the latest Lenovo considers valid for this Thinkpad X1 Extreme. Force-installing anything fresher may invite for more trouble than it's worth. Looks like you found a genuine NVidia Vulkan bug. Looking forward to see if they respond! |
Hang is unfortunately still occurring in the latest latest 2 Nvidia driver versions, 457.09 and 457.30 (November 2020) and gfx-rs/wgpu-rs@2563f20 |
Here are minimal repro traces that hang on my machine in
This only occurs with the vulkan backend. The traces using the dx12 backend replay correctly. The diff between the two traces boils down to the additional buffer creation and render pass. Removing the buffer copy in trace-bad trace.ron line 442 causes the replay to run successfully (but only displays one triangle):
How to create the trace:
Running the trace using wgpu/player: Windows 10 64-bit |
Do you guys have an API trace for a fresh version of |
Interestingly, the "trace-bad" is replayed without issues here on "GTX 1050 Ti Max-Q" driver version 27.21.14.5256. |
I'm going to close this as out of date, if you still having issues, plese file a new bug. |
Description
It's a little hard to turn this into a small repro case so please forgive the vagueness.
After submitting a frame to wgpu whilst using Vulkan backend, Vulkan seems to become unstable and this manifests itself in a few ways:
Our application has two ways to reproduce the bug:
In this second case, we perform the following sequence of events:
Seemingly the submit returns okay but the texture is completely empty, when we'd expect to see some graphics in it. The application then freezes (at least, for me on windows - this seems to vary) when dropping wgpu::Instance. For reference, the image it spits out should be identical to this one.
I've taken a trace of this single-frame capture and had to manually close the toml as the recording can't finish. This seems to freeze when played back, but I'm unable to get renderdoc to play nice and see anything from it.
This worked for us in the past, I think as soon as 24 days ago I was running this without issues. The same code, unchanged, no longer works today.
Repro steps
I haven't been able to create a minimal reproducible example, but you can see it in our project with the following steps:
cargo run --package=ruffle_desktop -- test.swf
if you want to see it visually, with multiple framescargo run --package=exporter -- test.swf
if you want to see the single frame saved to a texture on diskYou can apply this commit to reduce the amount of rendering done to the bare minimum that still crashes, with that particular swf: Dinnerbone/ruffle@b4f173d
Expected vs observed behavior
I expect to either get an error describing how we're using wgpu wrong, or for it to work :D
Extra materials
Platform
Reproduced on Windows.
Only affects Vulkan backend. We're seeing some instability with DX12 but not certain it's related yet.
Reproduced on wgpu 0.6 and gfx-rs/wgpu-rs@e3eadca
The text was updated successfully, but these errors were encountered: