avoid realloc memory across frame in GpuArrayBuffer #11290

re0312 · 2024-01-10T22:48:37Z

Objective

currently,bevy use GputArrayBuffer for meshUniform,however ,in GpuArrayBuffer::StorageBuffer keeps a buffer will pass to Gpu everyframe that will be drop across frame which is not necessary.

Solution

keep the buffer to avoid realloc

Peformance

Window 11 ,Intel 13400kf, NV 4070Ti
(Only the platform which support StorageBuffer will benefit from this pr)

many cubes
yellow is main , red is pr

frame meantime from 3.85ms to 3.23ms,~16% reduction

hot-spot function : batch_and_prepare_render_phase

meantime from 1.01ms to 0.567ms,almost 100% gain

JMS55

Good spot! Approved, but you need to remove the unused mem import to get CI to pass.

mockersf · 2024-01-11T06:01:44Z

will that memory still be dealloced when it's not needed?

JMS55 · 2024-01-11T06:56:41Z

We'll want to implement a scheme that shrinks the buffer capacity by a certain percentage if a certain percentage is unused when flip flopping. I have similar logic on my meshlet PR.

re0312 · 2024-01-11T07:09:39Z

will that memory still be dealloced when it's not needed?

No, memory usage will remain at the peak level required to hold the maximum number of meshes rendered on screen ,but the actual memory cost is relatively small, for example , 160k meshes (a pretty large number) obly cost 36 * 4B * 160000 ~ 21Mb

mockersf · 2024-01-11T07:12:57Z

Could you add a todo comment in the code for that?

re0312 · 2024-01-11T17:48:32Z

@JMS55 any reasons why we keep two vector in GpuArrayBuff::StorageBuffer ? IMO, the second vector seems unnecessary (or I might be overlooking something important).

james7132

Thanks for making this PR! We can always add a shrink_to_fit function for reclaiming the unused memory.

Elabajaba · 2024-01-12T07:30:58Z

Can't we just call shrink_to_fit before we clear the vec?

james7132 · 2024-01-12T16:10:02Z

Can't we just call shrink_to_fit before we clear the vec?

This would likely cause more reallocations than we actually need, negating a significant portion of the performance gains.

JMS55 · 2024-01-13T05:13:54Z

What I've been doing is when resetting the vec, if there's more than 30% spare capacity, shrink down to 30%.

JMS55 · 2024-01-13T05:16:15Z

any reasons why we keep two vector in GpuArrayBuff::StorageBuffer ? IMO, the second vector seems unnecessary (or I might be overlooking something important).

I don't know. I don't see why we need the second vec instead of just using buffer.get_mut(). Feel free to remove it.

# Objective - Remove Vec as described in #11290 (comment) ## Solution - Rely on StorageBuffer's backing Vec instead --- ## Changelog - GpuArrayBuffer no longer has a redundant backing Vec

Elabajaba · 2024-01-16T21:23:29Z

I don't think this is still applicable after #11368, should probably be closed?

We still probably need a compaction strategy for the buffers now.

re0312 · 2024-01-16T21:51:15Z

I don't think this is still applicable after #11368, should probably be closed?

We still probably need a compaction strategy for the buffers now.

Yeah, I will close it. maybe we need an individual system to free(shrink) all the memory that is no longer needed for render world.

re0312 · 2024-01-16T21:52:28Z

close in favor of #11368

# Objective - since #9685 ,bevy introduce automatic batching of draw commands, - `batch_and_prepare_render_phase` take the responsibility for batching `phaseItem`, - `GetBatchData` trait is used for indentify each phaseitem how to batch. it defines a associated type `Data `used for Query to fetch data from world. - however,the impl of `GetBatchData ` in bevy always set ` type Data=Entity` then we acually get following code `let entity:Entity =query.get(item.entity())` that cause unnecessary overhead . ## Solution - remove associated type `Data ` and `Filter` from `GetBatchData `, - change the type of the `query_item ` parameter in get_batch_data from` Self::Data` to `Entity`. - `batch_and_prepare_render_phase ` no longer takes a query using `F::Data, F::Filter` - `get_batch_data `now returns `Option<(Self::BufferData, Option<Self::CompareData>)>` --- ## Performance based in main merged with #11290 Window 11 ,Intel 13400kf, NV 4070Ti ![image](https://github.com/bevyengine/bevy/assets/45868716/f63b9d98-6aee-4057-a2c7-a2162b2db765) frame time from 3.34ms to 3 ms, ~ 10% ![image](https://github.com/bevyengine/bevy/assets/45868716/a06eea9c-f79e-4324-8392-8d321560c5ba) `batch_and_prepare_render_phase` from 800us ~ 400 us ## Migration Guide trait `GetBatchData` no longer hold associated type `Data `and `Filter` `get_batch_data` `query_item `type from `Self::Data` to `Entity` and return `Option<(Self::BufferData, Option<Self::CompareData>)>` `batch_and_prepare_render_phase` should not have a query

swap

e9b223c

re0312 changed the title ~~avoid realloc memory across data in GpuArrayBuffer~~ avoid realloc memory across frame in GpuArrayBuffer Jan 10, 2024

JMS55 approved these changes Jan 10, 2024

View reviewed changes

nit

8112bef

mockersf added A-Rendering Drawing game state to the screen C-Performance A change motivated by improving speed, memory usage or compile times labels Jan 11, 2024

todo

7b88aee

mockersf approved these changes Jan 11, 2024

View reviewed changes

james7132 approved these changes Jan 12, 2024

View reviewed changes

re0312 mentioned this pull request Jan 13, 2024

optimize batch_and_prepare_render_phase #11323

Merged

atlv24 mentioned this pull request Jan 16, 2024

Remove Vec from GpuArrayBuffer #11368

Merged

re0312 closed this Jan 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

avoid realloc memory across frame in GpuArrayBuffer #11290

avoid realloc memory across frame in GpuArrayBuffer #11290

re0312 commented Jan 10, 2024

JMS55 left a comment

mockersf commented Jan 11, 2024

JMS55 commented Jan 11, 2024

re0312 commented Jan 11, 2024 •

edited

Loading

mockersf commented Jan 11, 2024

re0312 commented Jan 11, 2024

james7132 left a comment

Elabajaba commented Jan 12, 2024

james7132 commented Jan 12, 2024

JMS55 commented Jan 13, 2024

JMS55 commented Jan 13, 2024 •

edited

Loading

Elabajaba commented Jan 16, 2024

re0312 commented Jan 16, 2024

re0312 commented Jan 16, 2024

avoid realloc memory across frame in GpuArrayBuffer #11290

avoid realloc memory across frame in GpuArrayBuffer #11290

Conversation

re0312 commented Jan 10, 2024

Objective

Solution

Peformance

JMS55 left a comment

Choose a reason for hiding this comment

mockersf commented Jan 11, 2024

JMS55 commented Jan 11, 2024

re0312 commented Jan 11, 2024 • edited Loading

mockersf commented Jan 11, 2024

re0312 commented Jan 11, 2024

james7132 left a comment

Choose a reason for hiding this comment

Elabajaba commented Jan 12, 2024

james7132 commented Jan 12, 2024

JMS55 commented Jan 13, 2024

JMS55 commented Jan 13, 2024 • edited Loading

Elabajaba commented Jan 16, 2024

re0312 commented Jan 16, 2024

re0312 commented Jan 16, 2024

re0312 commented Jan 11, 2024 •

edited

Loading

JMS55 commented Jan 13, 2024 •

edited

Loading