Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

avoid realloc memory across frame in GpuArrayBuffer #11290

Closed
wants to merge 3 commits into from

Conversation

re0312
Copy link
Contributor

@re0312 re0312 commented Jan 10, 2024

Objective

  • currently,bevy use GputArrayBuffer for meshUniform,however ,in GpuArrayBuffer::StorageBuffer keeps a buffer will pass to Gpu everyframe that will be drop across frame which is not necessary.

Solution

  • keep the buffer to avoid realloc

Peformance

Window 11 ,Intel 13400kf, NV 4070Ti
(Only the platform which support StorageBuffer will benefit from this pr)

many cubes
yellow is main , red is pr
1
frame meantime from 3.85ms to 3.23ms,~16% reduction

hot-spot function : batch_and_prepare_render_phase
2
meantime from 1.01ms to 0.567ms,almost 100% gain

@re0312 re0312 changed the title avoid realloc memory across data in GpuArrayBuffer avoid realloc memory across frame in GpuArrayBuffer Jan 10, 2024
Copy link
Contributor

@JMS55 JMS55 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good spot! Approved, but you need to remove the unused mem import to get CI to pass.

@mockersf mockersf added A-Rendering Drawing game state to the screen C-Performance A change motivated by improving speed, memory usage or compile times labels Jan 11, 2024
@mockersf
Copy link
Member

will that memory still be dealloced when it's not needed?

@JMS55
Copy link
Contributor

JMS55 commented Jan 11, 2024

We'll want to implement a scheme that shrinks the buffer capacity by a certain percentage if a certain percentage is unused when flip flopping. I have similar logic on my meshlet PR.

@re0312
Copy link
Contributor Author

re0312 commented Jan 11, 2024

will that memory still be dealloced when it's not needed?

No, memory usage will remain at the peak level required to hold the maximum number of meshes rendered on screen ,but the actual memory cost is relatively small, for example , 160k meshes (a pretty large number) obly cost 36 * 4B * 160000 ~ 21Mb

@mockersf
Copy link
Member

Could you add a todo comment in the code for that?

@re0312
Copy link
Contributor Author

re0312 commented Jan 11, 2024

@JMS55 any reasons why we keep two vector in GpuArrayBuff::StorageBuffer ? IMO, the second vector seems unnecessary (or I might be overlooking something important).

Copy link
Member

@james7132 james7132 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making this PR! We can always add a shrink_to_fit function for reclaiming the unused memory.

@Elabajaba
Copy link
Contributor

Can't we just call shrink_to_fit before we clear the vec?

@james7132
Copy link
Member

Can't we just call shrink_to_fit before we clear the vec?

This would likely cause more reallocations than we actually need, negating a significant portion of the performance gains.

@JMS55
Copy link
Contributor

JMS55 commented Jan 13, 2024

What I've been doing is when resetting the vec, if there's more than 30% spare capacity, shrink down to 30%.

@JMS55
Copy link
Contributor

JMS55 commented Jan 13, 2024

any reasons why we keep two vector in GpuArrayBuff::StorageBuffer ? IMO, the second vector seems unnecessary (or I might be overlooking something important).

I don't know. I don't see why we need the second vec instead of just using buffer.get_mut(). Feel free to remove it.

github-merge-queue bot pushed a commit that referenced this pull request Jan 16, 2024
# Objective

- Remove Vec as described in
#11290 (comment)

## Solution

- Rely on StorageBuffer's backing Vec instead

---

## Changelog

- GpuArrayBuffer no longer has a redundant backing Vec
@Elabajaba
Copy link
Contributor

I don't think this is still applicable after #11368, should probably be closed?

We still probably need a compaction strategy for the buffers now.

@re0312
Copy link
Contributor Author

re0312 commented Jan 16, 2024

I don't think this is still applicable after #11368, should probably be closed?

We still probably need a compaction strategy for the buffers now.

Yeah, I will close it. maybe we need an individual system to free(shrink) all the memory that is no longer needed for render world.

@re0312
Copy link
Contributor Author

re0312 commented Jan 16, 2024

close in favor of #11368

@re0312 re0312 closed this Jan 16, 2024
github-merge-queue bot pushed a commit that referenced this pull request Jan 20, 2024
# Objective

- since #9685  ,bevy introduce automatic batching of draw commands, 
- `batch_and_prepare_render_phase` take the responsibility for batching
`phaseItem`,
- `GetBatchData` trait is used for indentify each phaseitem how to
batch. it defines a associated type `Data `used for Query to fetch data
from world.

- however,the impl of `GetBatchData ` in bevy always set ` type
Data=Entity` then we acually get following code
`let entity:Entity =query.get(item.entity())` that cause unnecessary
overhead .

## Solution

- remove associated type `Data ` and `Filter` from `GetBatchData `,
- change the type of the `query_item ` parameter in get_batch_data from`
Self::Data` to `Entity`.
- `batch_and_prepare_render_phase ` no longer takes a query using
`F::Data, F::Filter`
- `get_batch_data `now returns `Option<(Self::BufferData,
Option<Self::CompareData>)>`

---

## Performance
based in main merged with #11290 
Window 11 ,Intel 13400kf, NV 4070Ti

![image](https://github.com/bevyengine/bevy/assets/45868716/f63b9d98-6aee-4057-a2c7-a2162b2db765)
frame time from 3.34ms to 3 ms,  ~ 10%


![image](https://github.com/bevyengine/bevy/assets/45868716/a06eea9c-f79e-4324-8392-8d321560c5ba)
`batch_and_prepare_render_phase` from 800us ~ 400 us  

## Migration Guide
trait `GetBatchData` no longer hold associated type  `Data `and `Filter`
`get_batch_data` `query_item `type from `Self::Data` to `Entity` and
return `Option<(Self::BufferData, Option<Self::CompareData>)>`
`batch_and_prepare_render_phase`  should not have a query
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-Rendering Drawing game state to the screen C-Performance A change motivated by improving speed, memory usage or compile times
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants