Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wgpu version 0.19 performance regression #5180

Closed
nathanielsimard opened this issue Feb 1, 2024 · 5 comments
Closed

wgpu version 0.19 performance regression #5180

nathanielsimard opened this issue Feb 1, 2024 · 5 comments

Comments

@nathanielsimard
Copy link

We recently released a new version of Burn that includes the latest version of wgpu, and we observed a significant performance regression. The original issue is documented here: tracel-ai/burn#1213, and it appears that reverting to wgpu version 0.18.0 resolves the problem (refer to the fix/slow-wgpu branch).

It's possible that this regression is related to the new and improved multithreading in wgpu. However, our framework follows a client-server architecture, allowing us to configure the communication channel between them. Whether using a mutex (involving multiple threads) or a channel with communication limited to a single thread, we observed no change in performance behavior.

For short-lived programs, the impact on execution is minimal. It's worth noting that we manually select the adapter, and I did all tests on a discrete GPU with the Vulkan backend on Pop OS.

@Elabajaba
Copy link
Contributor

Do you know why the wgpu 0.19 version is taking significantly more memory than the previous version in the screenshots? It looks like it might be hitting a memory leak?

@nathanielsimard
Copy link
Author

I don't believe memory is the issue. When a memory leak occurs, it only takes a few seconds to fill the GPU memory and freeze the entire system 😅. My intuition suggests that the recent changes with the read-write locks are causing excessive overhead. While this might not be significant for graphics since you typically won't render at more than 240 FPS, for compute tasks, it could be comparable to rendering at 20,000 FPS or even more.

If this is indeed the case (uncertain how we can easily test it), then we could swap those read-write locks with RefCell using a feature flag. This way, we can avoid altering the architecture while enabling single-threaded execution with minimal overhead.

@hakolao
Copy link
Contributor

hakolao commented Feb 5, 2024

I have the same in my project. FPS slowly just keeps dropping and dropping. I have not spent too much time figuring out why, because I didn't have time to spend for debugging, but the drop is significant and makes my game unusable.

I use compute shaders a lot.

The difference is: 0.18: Very playable vs 0.19 completely unplayable.

@cryscan
Copy link

cryscan commented Feb 5, 2024

@hakolao It seems that in v0.19, the program is waiting on CPUs a lot in wgpu::ComputePipeline::get_bind_group_layout. See #5196

@cwfitzgerald
Copy link
Member

I think this an #5196 are the same, going to close in favor of that.

@cwfitzgerald cwfitzgerald closed this as not planned Won't fix, can't repro, duplicate, stale Feb 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants