-
-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
avoid realloc memory across frame in GpuArrayBuffer #11290
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good spot! Approved, but you need to remove the unused mem import to get CI to pass.
will that memory still be dealloced when it's not needed? |
We'll want to implement a scheme that shrinks the buffer capacity by a certain percentage if a certain percentage is unused when flip flopping. I have similar logic on my meshlet PR. |
No, memory usage will remain at the peak level required to hold the maximum number of meshes rendered on screen ,but the actual memory cost is relatively small, for example , 160k meshes (a pretty large number) obly cost 36 * 4B * 160000 ~ 21Mb |
Could you add a todo comment in the code for that? |
@JMS55 any reasons why we keep two vector in GpuArrayBuff::StorageBuffer ? IMO, the second vector seems unnecessary (or I might be overlooking something important). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for making this PR! We can always add a shrink_to_fit
function for reclaiming the unused memory.
Can't we just call |
This would likely cause more reallocations than we actually need, negating a significant portion of the performance gains. |
What I've been doing is when resetting the vec, if there's more than 30% spare capacity, shrink down to 30%. |
I don't know. I don't see why we need the second vec instead of just using buffer.get_mut(). Feel free to remove it. |
# Objective - Remove Vec as described in #11290 (comment) ## Solution - Rely on StorageBuffer's backing Vec instead --- ## Changelog - GpuArrayBuffer no longer has a redundant backing Vec
I don't think this is still applicable after #11368, should probably be closed? We still probably need a compaction strategy for the buffers now. |
Yeah, I will close it. maybe we need an individual system to free(shrink) all the memory that is no longer needed for render world. |
close in favor of #11368 |
# Objective - since #9685 ,bevy introduce automatic batching of draw commands, - `batch_and_prepare_render_phase` take the responsibility for batching `phaseItem`, - `GetBatchData` trait is used for indentify each phaseitem how to batch. it defines a associated type `Data `used for Query to fetch data from world. - however,the impl of `GetBatchData ` in bevy always set ` type Data=Entity` then we acually get following code `let entity:Entity =query.get(item.entity())` that cause unnecessary overhead . ## Solution - remove associated type `Data ` and `Filter` from `GetBatchData `, - change the type of the `query_item ` parameter in get_batch_data from` Self::Data` to `Entity`. - `batch_and_prepare_render_phase ` no longer takes a query using `F::Data, F::Filter` - `get_batch_data `now returns `Option<(Self::BufferData, Option<Self::CompareData>)>` --- ## Performance based in main merged with #11290 Window 11 ,Intel 13400kf, NV 4070Ti ![image](https://github.com/bevyengine/bevy/assets/45868716/f63b9d98-6aee-4057-a2c7-a2162b2db765) frame time from 3.34ms to 3 ms, ~ 10% ![image](https://github.com/bevyengine/bevy/assets/45868716/a06eea9c-f79e-4324-8392-8d321560c5ba) `batch_and_prepare_render_phase` from 800us ~ 400 us ## Migration Guide trait `GetBatchData` no longer hold associated type `Data `and `Filter` `get_batch_data` `query_item `type from `Self::Data` to `Entity` and return `Option<(Self::BufferData, Option<Self::CompareData>)>` `batch_and_prepare_render_phase` should not have a query
Objective
Solution
Peformance
Window 11 ,Intel 13400kf, NV 4070Ti
(Only the platform which support StorageBuffer will benefit from this pr)
many cubes
yellow is main , red is pr
frame meantime from 3.85ms to 3.23ms,~16% reduction
hot-spot function : batch_and_prepare_render_phase
meantime from 1.01ms to 0.567ms,almost 100% gain