-
Notifications
You must be signed in to change notification settings - Fork 960
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Arcanization #2272
[WIP] Arcanization #2272
Conversation
Any progress on this? This is blocking Rend3 #350, , which is really slowing down my Second Life / Open Simulator renderer. That, being a modern multiverse system where content comes from the network, is constantly loading content in other threads while displaying the current scene. I need minimal slowdown in the refresh thread as this takes place. I'm getting frame rates in the 20FPS range during heavy content loading. |
This is stale, in need of a champion. |
A rename might help. Like "Fix concurrency performance bug". |
Still a problem. As I said above, 20 FPS. Updating the scene from another thread kills rendering performance. Watch this video. This is the kind of thing I need to render fast while new content is coming in from the server. |
@John-Nagle This is absolutely the kind of thing that One thing you might be in a good position to do that would help enormously to move this forward is to pull together some benchmarks that exhibit the slowdowns you're concerned about. For performance work, it is almost always futile to stare at the code and guess how to make your users' code run faster. Having a realistic load to benchmark and instrument makes it possible to direct one's work effectively. These benchmarks would probably need to be freely redistributable, so that wgpu contributors could download them and work on them. Is that something you might be able to help with? |
For example - we know that the hub RwLocks are contended - that's the point of this bug. But which ones are worth fixing first? I'll bet cleaning up And what if it's not just the |
I have a big system that does a lot in parallel, but not a micro-benchmark. What I have obtains its content from servers, and there are IP issues around how that can be used. We need 1) something self-contained refreshing a scene with a lot of different objects, and 2) other threads busily adding and deleting objects, textures, and materials from the scene. I think Connor Fitzgerald might have a test. Something to mod. It's not a standard Rend3 example, though. Does anyone have something I can work from? Something that generates test meshes, materials, and textures? Thanks. |
I suppose a potential benchmark can just be that: how many draw calls can be recorded in fixed time. Supposing 2 pipelines, 2 bind groups, and each draw call is just alternating between them, in order to avoid the state change being optimized out. We could write this down as a function of N threads and see if the number of draw calls scales up accordingly. Literally, as a criterion benchmark. We don't care about execution, even - just recording. The hypothesis is that it will not scale, because of the render pass end locks. |
It makes sense to check this specific scaling property. That's a good litmus test for "hub rwlocks are fixed". But we also need to know how that one aspect of performance fits into everything else that goes on in a realistic load. It seems unlikely to me that the hub rwlocks are the only thing we need to know about, if we really want to support cases like the one shown in John-Nagle's video. This is why I want more realistic loads to work with - my guess is that full arcanization isn't necessary, and that arcanization alone isn't sufficient, to bring wgpu to the performance we want. |
There are some tracy logs in this rend3 issue of @John-Nagle's program specifically. I don't believe resource contention is the direct cause of the slowdown in his case (it is caused by tracking performance with massive bindless bind groups) but once that problem is solved it will start to show its face again. It's a bit hard to tell a single culprit for locking from the trace, but it's mainly thing fighting against create_texture or write_textures on other threads. As for arcinzation itself, we definitely should write a benchmark command lists encoding against each other, as it should be able to operate fully in parallel. This is a pretty common one people hit and has been the direct cause of us losing users. I was planning on writing a minimal wgpu benchmark for tracking performance (maybe reviving kvark's https://github.com/kvark/wgpu-bench) so doing another one for threading would be easy. |
I agree a test case is needed. So, I'm starting to write one. (https://github.com/John-Nagle/render-bench) Nothing there yet; it's just an empty Rust project right now. What I intend to do is generate a city of random blocky objects, all different, and constantly replace objects with other objects from multiple threads while another thread runs the renderer. That should replicate the kind of load I'm putting on the system in my real program. It's not draw that's the problem. It's updating meshes, materials, and textures on the fly while drawing. See this Rend3 bug report, which has Tracy output. The updating threads are making Rend3 calls which queue up work to be done by the rendering thread, and that's slowing down the rendering thread. A lot. (Why is is this important? Because I'm rendering a virtual world in which you can move around, and which is far too big to be in memory all at once. In the video, you see what looks like a static world, but inside the program, content is being frantically loaded and unloaded at various levels of detail as the camera moves.) I'm not clear on how much of this blocking is Rend3 and how much is WGPU, but since Connor Fitzgerald said this #2272 was a block on Rend3 #350, I'm in here talking about this. |
Dying to see both of these benchmarks (and dying for time to work on perf) |
I may be able to find some time to contribute to this, although a good portion of that will be reviewing the previous discussions and the current diff to get an understanding of all the details. |
Plugging away on my render-bench. Right now, one big brick cube appears. Soon, something more reasonable with content stats similar to the real scenes I've shown. This should make the complex scene case more testable. |
As requested, I have constructed a benchmark/test for this situation. See https://github.com/John-Nagle/render-bench This creates and removes a large number of non-shared meshes and materials from one thread, while another thread does screen redraws and nothing else.. Currently, The frame rate here is at 60FPS on the static scene, but drops to 13 FPS during mesh loading. This is built on Rend3, and uses code from the Rend3 examples. |
@John-Nagle Thank you so much for this! The issues reproduce very clearly on my machine and gives me a really useful tracy trace. I will put this all together into a larget scale tracking issue detailing the various bottlenecks we have right now. |
Thanks. It's good to hear that. Now this is an easily reproducible problem. Cleaned up the test case (warnings, Clippy, format), but the only functional change is that it now prints the number of meshes added and deleted. |
Considering our change of plans, and the formalization into an issue in the form of #2710, I'm going to close this. |
Co-authored-by: Jim Blandy <[email protected]>
This is what's left from @pythonesque work.