-
Notifications
You must be signed in to change notification settings - Fork 960
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
wgpu version 0.19 performance regression #5180
Comments
Do you know why the wgpu 0.19 version is taking significantly more memory than the previous version in the screenshots? It looks like it might be hitting a memory leak? |
I don't believe memory is the issue. When a memory leak occurs, it only takes a few seconds to fill the GPU memory and freeze the entire system 😅. My intuition suggests that the recent changes with the read-write locks are causing excessive overhead. While this might not be significant for graphics since you typically won't render at more than 240 FPS, for compute tasks, it could be comparable to rendering at 20,000 FPS or even more. If this is indeed the case (uncertain how we can easily test it), then we could swap those read-write locks with |
I have the same in my project. FPS slowly just keeps dropping and dropping. I have not spent too much time figuring out why, because I didn't have time to spend for debugging, but the drop is significant and makes my game unusable. I use compute shaders a lot. The difference is: 0.18: Very playable vs 0.19 completely unplayable. |
I think this an #5196 are the same, going to close in favor of that. |
We recently released a new version of Burn that includes the latest version of wgpu, and we observed a significant performance regression. The original issue is documented here: tracel-ai/burn#1213, and it appears that reverting to wgpu version 0.18.0 resolves the problem (refer to the
fix/slow-wgpu
branch).It's possible that this regression is related to the new and improved multithreading in wgpu. However, our framework follows a client-server architecture, allowing us to configure the communication channel between them. Whether using a mutex (involving multiple threads) or a channel with communication limited to a single thread, we observed no change in performance behavior.
For short-lived programs, the impact on execution is minimal. It's worth noting that we manually select the adapter, and I did all tests on a discrete GPU with the Vulkan backend on Pop OS.
The text was updated successfully, but these errors were encountered: