-
-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Declining performance with unreleased 0.3.x vs 0.2.2 #350
Comments
This is a longer form comment of what I posted on the matrix: I first want to emphasize that it may take a little bit to iron out all these issues, but they are all definitely fixable in one way or another.
This tracy trace is actually quite helpful as it shows me clearly that your program is showing different performance characteristics from the ones I've been testing and gives me some hints about what might be causing it. How many materials do you have and how many textures do you have? Material upload seems to take a bit and I've identified texture-related bottleneck inside of wgpu's run_render_pass before -- which could explain the performance with mipmapping (more mips = more textures to track).
I currently still do the mesh work on the main thread because there would be synchronization issues if I didn't do this. The mesh upload is not particularly optimized at this second and I have plans on how to improve that. Even if I do split out the mesh work onto another thread, both this and textures becoming truely multithreaded is blocked on gfx-rs/wgpu#2272. This is something I want to get to, but is a quite large task.
I haven't noticed any performance regressions in my testing, so I want to put together some kind of test case that has similar traits to yours so that I can keep track of performance in this use case. This is also something I want to put together in general as I need to ensure each particular way of using rend3 improves (and doesn't regress) in performance.
Finally I do want to say that these problems are all fixable. I can't promise it will get done immediately, I'm currently but one person (though @setzer22 has recently joined the project 👋🏻) and have a ton on my plate, but everything will be fixed. My goal, as progress goes on is that rend3's performance should end up well above what it was in 0.2. I think this is totally achievable. |
Copying some numbers and conclusions from our discussion on the matrix here so I don't lose them. Scene stats:
Todos:
|
"I first want to emphasize that it may take a little bit to iron out all these issues, but they are all definitely fixable in one way or another." That's good to hear. "Material upload seems to take a bit" Most changes to materials already in use are only changes to the texture handles involved. While anything can change, usually, most things don't. If an API call for changing only texture handles would help performance, I could make such calls. "I currently still do the mesh work on the main thread because there would be synchronization issues if I didn't do this. " That explains some things. Back in October when I made that video, I was loading all the meshes with no textures, and then turned on concurrent texture loading. Performance looked good back then. Then I started loading meshes from one thread and textures from another, while refreshing from a third thread. Performance dropped to down around 20 FPS at times while meshes and textures were being loaded. Loading textures still degrades the frame rate but not, it seems, as badly as loading meshes. "I haven't noticed any performance regressions in my testing, so I want to put together some kind of test case that has similar traits to yours so that I can keep track of performance in this use case." All those non-reused textures and meshes are a problem. But that's user-created content for you. If the NFT metaverse crowd ever actually gets 3D worlds going, they'll face that. By the way, Unreal Engine 5's Nanite system is heavily dependent on reusing instances of objects. In their world, a mesh is a directed acyclic graph in which subsections of the mesh are shared. Something like a chain-link fence is represented by a very small number of unique mesh parts shared within a single data structure. It's very clever, but their demos rely heavily on instancing. |
Good news, I can reproduce this with a simple code-based test case. 10k meshes/materials/textures repos nicely.
Interesting, I knew about the rendering tech but I never looked too much into how it's actually stored. That makes sense. |
Oh, good. A simple test case always helps. |
Profiling just as texture loading caught up. This shows the difference between frame times while textures are being loaded from other threads, and while they are not. Around 35ms/frame while textures are being loaded, down to 23ms/frame once loading is done. "triage suspected" accounts for some of the difference, but not all of it. |
This problem has totally nerd sniped me. Been faffing about in wgpu trying to get performance improvements, and so far have gotten my demo from 39fps up to 100fps. I still need to upstream the changes, which will require being less hacky with my changes, but that should all happen. |
That's great! I'm working on profiling my own stuff now. Tracy isn't showing all my threads, even ones that are using substantial CPU. Not clear why. That capture above should have shown three more threads which do different things, not just the main thread and the multiple asset loader threads. Any ideas? I just started using Tracy and probably missed something. |
Tracy will only show threads that have spans on them, so if you want your threads to show, you need to annotate the work done with spans (you can use profiling for this) |
Ah. That's it. Thanks. More profiling data soon. |
More profiling data. Large Tracy file: This is the usual Babbage Palisade scene, from startup through loading to just sitting there refreshing. The part at the end, where the CPU load drops way down, is when the scene is just redrawing without changes. What all those threads are doing:
Notes:
So that's more detail. |
Just giving an update on tracking performance improvements -- this unfortunately was a regression for other wgpu projects so couldn't be brought in as a whole. That being said I have some good ideas for proving both cases. I would, for now, stick with 0.2, as these get sorted, I can't promise they happen with any speed with how much is on my plate right now. Will work on the backport shortly. |
Thanks. I've converted over to "unreleased" from a few weeks back, and it's working well, although sluggish on big scenes. I'm working on another part of the system, concurrent mesh loading, and that's keeping me busy. So don't worry about the backport too much. The general speedup is more useful at this point. |
Closing due to #593 |
(Related to #348, but not entirely about memory bloat.)
Frame rate has dropped since 0.2.2.
With Rend3 0.2.2, my program was about 58 FPS on this test case, which was fine. Lost about 10FPS with the new version. Frame rate will drop as low as 23 FPS if peak memory usage exceeds the GPU's memory. It will also drop when other threads are loading textures and meshes.
This is the ideal case for frame rate. The entire scene, all textures and meshes, are all in the GPU. The camera is not moving. The same image is being displayed over and over. No mipmapping. Rend3 Unreleased (pre 0.3.x). Ubuntu 20.04 LTS. Nvidia 3070. Ryzen 5 with 6 cores, 12 hyperthreads.
Here's an initial Tracy profile. Call stacks are not being captured, so this is rather coarse data. My own code is barely doing anything here; it and the window event system are using 32us per frame.
The problems:
I'm attempting to build, in Rust, a client for Second Life / Open Simulator, because the existing C++ OpenGL clients are single-thread and too slow. Rend3's performance used to be well above the existing programs, but it's now not much better, and in some cases is worse. This is puzzling, because Rend3 is using Vulkan and has all the data resident in the GPU, while the C++ clients are using OpenGL and making huge numbers of draw calls.
I'm very concerned about this. If Rend3's performance doesn't improve substantially, my whole effort was a waste.
The text was updated successfully, but these errors were encountered: