`CommandEncoder::run_render_pass` takes much longer to execute when there are many vertex and index buffers allocated, regardless of the number submitted to the `RenderPass` #5514

RedMindZ · 2024-04-10T14:33:15Z

Description
The time it takes to execute CommandEncoder::run_render_pass is heavily impacted by the total number of allocated vertex and index buffers, regardless of how many of them are actually submitted to a RenderPass.

Expected vs observed behavior

With 2 vertex buffers and 2 index buffers allocated and submitted to the render pass, the call takes ~50 microseconds.
With ~2000 vertex buffers and ~2000 index buffers allocated and only 2 vertex buffers and 2 index buffers submitted to the render pass, the call takes ~500 microseconds.

This slowdown can also be observed with a large number of unused bind groups.

I have also confirmed with RenderDoc that the exact same render commands are sent to the GPU.

I am not familiar with the internals of wgpu, but I assumed the time it takes to encode a render pass should only depend on the content of the render pass, and be independent of the total number of GPU objects in existence.

Repro steps
Allocate a small number (e.g. 2) of vertex and index buffers, submit them to a RenderPass, and measure the time it takes for the Drop implementation of RenderPass.
Then, allocate a large number (e.g. 2000) of vertex and index buffers, submit the same number as the previous render pass to a RenderPass, and measure the time it takes for the Drop implementation of RenderPass.

Extra materials
I can provide the traces if necessary, but they are very large (over 4GB), so let me know if they are actually necessary.

Platform
bevy 0.13 with wgpu 0.19.1 running on Windows 10 version 10.0.19045 with a GTX 1080 and driver version 536.23. Tested with both DX12 and Vulkan.

The text was updated successfully, but these errors were encountered:

Wumpf · 2024-04-10T19:11:08Z

interesting, sounds like hash lookups for the underlying vertex/index buffer ids just take a bit longer or some other cache-miss heavy thing occus. Not aware of anything else that would make this slower by design. 500us seems crazy long though for this.
Minimal repro code would be appreciated if possible!

RedMindZ · 2024-04-11T13:39:07Z

I have a created a minimal example based on the hello_triangle example here: https://gist.github.com/RedMindZ/eb1033b0b903d35f5cfba6919bcad25d

The important changes are in main.rs on lines 57-65 and lines 142-145:
On lines 57-65 we simply allocate 2M vertex buffers, without ever using them.
On lines 142-145 we just measure the time it takes to drop the render pass.
The rest of the lines are effectively the same as the hello_triangle example.
You can resize the window to get it to redraw and encode another render pass.
With 2M vertex buffers it takes about 16ms to encode the render pass.

This minimal example enabled me to profile the code effectively, and the profiler points to the UsageScope struct:
Specifically, dropping the UsageScope and calling new on it. The call to new sets the size on the buffers and textures fields, which in turn allocate a vector with the specified size.
It seems that since the size is 2M, it takes a while to allocate and drop the UsageScope struct.

It also looks to me like this is already addressed on the main branch (trunk), where the UsageScope is using a pool to allocate those vectors, but this is an issue in 0.19. Would it be reasonable for me to just use the main branch, or is it too unstable?

Edit: Seems like #5414 was created exactly to address this issue.

Wumpf · 2024-04-12T12:20:09Z

ah 🤦 didn't realize the connection with #5414 despite having reviewed it myself, was too fixated on that being about allocating and didn't make the connection of when and where said allocations are happening. Thank you so much for following up!
But that also means that we can close this ticket as it's solved on trunk (plz reopen if your testing shows otherwise).

We actually wanted to do a release very soon, but some of the issues on https://github.com/gfx-rs/wgpu/milestone/19 are still blocking. If you're not bothered by that I'd even enourage you to use trunk - users's that do are often the only way we can be reasonably certain that trunk is ready to release

Wumpf added the area: performance How fast things go label Apr 10, 2024

Wumpf closed this as not planned Won't fix, can't repro, duplicate, stale Apr 12, 2024

Wumpf added the resolution: duplicate label Apr 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`CommandEncoder::run_render_pass` takes much longer to execute when there are many vertex and index buffers allocated, regardless of the number submitted to the `RenderPass` #5514

`CommandEncoder::run_render_pass` takes much longer to execute when there are many vertex and index buffers allocated, regardless of the number submitted to the `RenderPass` #5514

RedMindZ commented Apr 10, 2024 •

edited

Loading

Wumpf commented Apr 10, 2024

RedMindZ commented Apr 11, 2024 •

edited

Loading

Wumpf commented Apr 12, 2024 •

edited

Loading

CommandEncoder::run_render_pass takes much longer to execute when there are many vertex and index buffers allocated, regardless of the number submitted to the RenderPass #5514

CommandEncoder::run_render_pass takes much longer to execute when there are many vertex and index buffers allocated, regardless of the number submitted to the RenderPass #5514

Comments

RedMindZ commented Apr 10, 2024 • edited Loading

Wumpf commented Apr 10, 2024

RedMindZ commented Apr 11, 2024 • edited Loading

Wumpf commented Apr 12, 2024 • edited Loading

`CommandEncoder::run_render_pass` takes much longer to execute when there are many vertex and index buffers allocated, regardless of the number submitted to the `RenderPass` #5514

`CommandEncoder::run_render_pass` takes much longer to execute when there are many vertex and index buffers allocated, regardless of the number submitted to the `RenderPass` #5514

RedMindZ commented Apr 10, 2024 •

edited

Loading

RedMindZ commented Apr 11, 2024 •

edited

Loading

Wumpf commented Apr 12, 2024 •

edited

Loading