CommandEncoder::run_render_pass
takes much longer to execute when there are many vertex and index buffers allocated, regardless of the number submitted to the RenderPass
#5514
Labels
area: performance
How fast things go
Description
The time it takes to execute
CommandEncoder::run_render_pass
is heavily impacted by the total number of allocated vertex and index buffers, regardless of how many of them are actually submitted to aRenderPass
.Expected vs observed behavior
This slowdown can also be observed with a large number of unused bind groups.
I have also confirmed with RenderDoc that the exact same render commands are sent to the GPU.
I am not familiar with the internals of
wgpu
, but I assumed the time it takes to encode a render pass should only depend on the content of the render pass, and be independent of the total number of GPU objects in existence.Repro steps
Allocate a small number (e.g. 2) of vertex and index buffers, submit them to a
RenderPass
, and measure the time it takes for theDrop
implementation ofRenderPass
.Then, allocate a large number (e.g. 2000) of vertex and index buffers, submit the same number as the previous render pass to a
RenderPass
, and measure the time it takes for theDrop
implementation ofRenderPass
.Extra materials
I can provide the traces if necessary, but they are very large (over 4GB), so let me know if they are actually necessary.
Platform
bevy 0.13
withwgpu 0.19.1
running on Windows 10 version 10.0.19045 with a GTX 1080 and driver version 536.23. Tested with both DX12 and Vulkan.The text was updated successfully, but these errors were encountered: