Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make the pointers in Fetch and Query's iterators unions #6396

Closed
wants to merge 7 commits into from

Conversation

james7132
Copy link
Member

@james7132 james7132 commented Oct 28, 2022

This is a redo of #5085 and an extension of #4800.

Objective

Every Fetch struct initializes pointers for both storages, even though only one is ever really used. This both makes the struct bigger, and adds a minuscule amount of overhead when initializing a Fetch as it needs to zero out the unused Fetch impl.

Solution

Add a compile-time discriminated union called StorageSwitch which includes the different ways a Fetch could represent a pointer to the respective storage. This allows the pointers to share the same space and there's no redundant unused memory in any of the types. (Sole exception being WriteFetch, which has one usize of unused space when used on sparse components). ReadFetch should be the same size as a normal pointer.

As sparse sets always have a reference populated when making a fetch. They're no longer wrapped in an Option, and do not need any unwrapping.

Co-Authored-By: Boxy [email protected]

@james7132 james7132 requested review from BoxyUwU October 28, 2022 12:10
@james7132 james7132 added A-ECS Entities, components, systems, and events C-Performance A change motivated by improving speed, memory usage or compile times labels Oct 28, 2022
archetype_entities: &[],
table_id_iter: query_state.matched_table_ids.iter(),
archetype_id_iter: query_state.matched_archetype_ids.iter(),
id_iter: if Self::IS_DENSE {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be a match statement to make adding new storage types less error prone.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO we should actually probably replace the IS_DENSE bool with an enum?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd make the const evals for IS_DENSE for fetches notably harder to read without the current boolean composition.

Also I thought we were avoiding match bool?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah: that's part of why I suggested an enum ;)

I agree on the const evals though, so I'm fine to leave it. Adding new storage types will be very rare, and even then they'll likely be meaningfully characterized as dense or not.

Copy link
Member

@alice-i-cecile alice-i-cecile left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well done, needs more docs. The underlying strategy of using a union here, discriminated based on compile time properties makes a lot of sense to me.

I was a bit nervous that the added complexity would make adding more storage types harder in the future, but overall this seems like it pushes us in the right direction there. More type safety, more clarity, and no per-storage-mode overhead.

Can you run some benchmarks? I'm curious to see if this makes a difference. I think we should do this even if it's perf neutral though: it's pretty clearly the right representation.

@james7132
Copy link
Member Author

james7132 commented Nov 1, 2022

Did a quick round of microbenchmarks. Changes seem to between a small regression and a sizable gain, for the benchmarks that aren't within the noise threshold.

group                                                    main                                     union-fetch
-----                                                    ----                                     -----------
busy_systems/01x_entities_03_systems                     1.00     33.5±2.43µs        ? ?/sec      1.01     34.0±0.96µs        ? ?/sec
busy_systems/01x_entities_06_systems                     1.00     69.8±2.02µs        ? ?/sec      1.03     71.6±5.10µs        ? ?/sec
busy_systems/01x_entities_09_systems                     1.00     98.3±4.70µs        ? ?/sec      1.02    100.0±4.52µs        ? ?/sec
busy_systems/01x_entities_12_systems                     1.00    126.7±3.60µs        ? ?/sec      1.04    131.4±4.25µs        ? ?/sec
busy_systems/01x_entities_15_systems                     1.00    155.9±6.21µs        ? ?/sec      1.05    163.8±7.71µs        ? ?/sec
busy_systems/02x_entities_03_systems                     1.01     59.9±3.34µs        ? ?/sec      1.00     59.0±2.75µs        ? ?/sec
busy_systems/02x_entities_06_systems                     1.00    113.3±3.29µs        ? ?/sec      1.06    120.6±4.96µs        ? ?/sec
busy_systems/02x_entities_09_systems                     1.05    177.0±8.24µs        ? ?/sec      1.00   168.0±11.32µs        ? ?/sec
busy_systems/02x_entities_12_systems                     1.06    231.3±9.28µs        ? ?/sec      1.00    219.0±8.70µs        ? ?/sec
busy_systems/02x_entities_15_systems                     1.00   283.6±10.12µs        ? ?/sec      1.00   282.5±10.15µs        ? ?/sec
busy_systems/03x_entities_03_systems                     1.12     91.6±6.23µs        ? ?/sec      1.00     81.9±2.88µs        ? ?/sec
busy_systems/03x_entities_06_systems                     1.00    156.5±3.97µs        ? ?/sec      1.03    160.9±7.40µs        ? ?/sec
busy_systems/03x_entities_09_systems                     1.00    243.5±8.78µs        ? ?/sec      1.04   252.9±12.86µs        ? ?/sec
busy_systems/03x_entities_12_systems                     1.03   324.6±12.89µs        ? ?/sec      1.00   316.2±10.93µs        ? ?/sec
busy_systems/03x_entities_15_systems                     1.01   433.6±23.47µs        ? ?/sec      1.00   428.9±19.70µs        ? ?/sec
busy_systems/04x_entities_03_systems                     1.00    113.2±5.39µs        ? ?/sec      1.06    119.8±5.91µs        ? ?/sec
busy_systems/04x_entities_06_systems                     1.08   226.0±10.21µs        ? ?/sec      1.00    209.7±6.34µs        ? ?/sec
busy_systems/04x_entities_09_systems                     1.00   319.1±14.03µs        ? ?/sec      1.04   330.8±16.38µs        ? ?/sec
busy_systems/04x_entities_12_systems                     1.00   428.7±22.36µs        ? ?/sec      1.01   434.5±22.52µs        ? ?/sec
busy_systems/04x_entities_15_systems                     1.01   519.3±15.26µs        ? ?/sec      1.00   515.7±12.83µs        ? ?/sec
busy_systems/05x_entities_03_systems                     1.00    132.8±3.59µs        ? ?/sec      1.12    149.1±7.46µs        ? ?/sec
busy_systems/05x_entities_06_systems                     1.00    274.9±9.29µs        ? ?/sec      1.04   286.4±14.12µs        ? ?/sec
busy_systems/05x_entities_09_systems                     1.00   408.4±19.32µs        ? ?/sec      1.02   416.5±14.63µs        ? ?/sec
busy_systems/05x_entities_12_systems                     1.00   538.5±19.67µs        ? ?/sec      1.00   539.7±25.79µs        ? ?/sec
busy_systems/05x_entities_15_systems                     1.00   683.8±29.28µs        ? ?/sec      1.04   711.9±26.70µs        ? ?/sec
contrived/01x_entities_03_systems                        1.00     19.1±1.06µs        ? ?/sec      1.14     21.8±1.61µs        ? ?/sec
contrived/01x_entities_06_systems                        1.00     41.7±2.82µs        ? ?/sec      1.04     43.4±3.31µs        ? ?/sec
contrived/01x_entities_09_systems                        1.00     61.1±5.27µs        ? ?/sec      1.01     61.9±5.11µs        ? ?/sec
contrived/01x_entities_12_systems                        1.00     79.9±4.21µs        ? ?/sec      1.01     80.4±5.74µs        ? ?/sec
contrived/01x_entities_15_systems                        1.04    100.4±6.35µs        ? ?/sec      1.00     96.7±5.76µs        ? ?/sec
contrived/02x_entities_03_systems                        1.07     34.3±2.47µs        ? ?/sec      1.00     32.0±1.29µs        ? ?/sec
contrived/02x_entities_06_systems                        1.03     64.9±2.43µs        ? ?/sec      1.00     63.3±2.35µs        ? ?/sec
contrived/02x_entities_09_systems                        1.00     91.4±4.74µs        ? ?/sec      1.04     94.7±5.66µs        ? ?/sec
contrived/02x_entities_12_systems                        1.00    120.5±4.77µs        ? ?/sec      1.02    122.6±5.00µs        ? ?/sec
contrived/02x_entities_15_systems                        1.02    151.3±5.01µs        ? ?/sec      1.00    148.8±6.04µs        ? ?/sec
contrived/03x_entities_03_systems                        1.00     41.3±2.03µs        ? ?/sec      1.02     42.2±1.36µs        ? ?/sec
contrived/03x_entities_06_systems                        1.00     85.7±4.16µs        ? ?/sec      1.03     87.9±2.46µs        ? ?/sec
contrived/03x_entities_09_systems                        1.00    122.5±4.35µs        ? ?/sec      1.04    127.6±4.41µs        ? ?/sec
contrived/03x_entities_12_systems                        1.00    164.0±4.80µs        ? ?/sec      1.03   169.1±10.97µs        ? ?/sec
contrived/03x_entities_15_systems                        1.01    205.6±5.65µs        ? ?/sec      1.00    203.9±7.47µs        ? ?/sec
contrived/04x_entities_03_systems                        1.00     52.1±2.54µs        ? ?/sec      1.08     56.0±4.35µs        ? ?/sec
contrived/04x_entities_06_systems                        1.00    102.0±4.89µs        ? ?/sec      1.07    109.0±6.46µs        ? ?/sec
contrived/04x_entities_09_systems                        1.00    149.0±6.53µs        ? ?/sec      1.10    163.2±7.29µs        ? ?/sec
contrived/04x_entities_12_systems                        1.00    200.6±7.55µs        ? ?/sec      1.01    203.5±5.19µs        ? ?/sec
contrived/04x_entities_15_systems                        1.00    245.4±6.65µs        ? ?/sec      1.08   265.2±11.06µs        ? ?/sec
contrived/05x_entities_03_systems                        1.05     63.4±6.07µs        ? ?/sec      1.00     60.3±2.13µs        ? ?/sec
contrived/05x_entities_06_systems                        1.00    130.1±6.39µs        ? ?/sec      1.04   135.2±11.87µs        ? ?/sec
contrived/05x_entities_09_systems                        1.07    201.2±5.55µs        ? ?/sec      1.00    188.5±4.21µs        ? ?/sec
contrived/05x_entities_12_systems                        1.00    252.5±7.71µs        ? ?/sec      1.02   257.4±11.58µs        ? ?/sec
contrived/05x_entities_15_systems                        1.00    306.5±9.88µs        ? ?/sec      1.01   308.9±17.54µs        ? ?/sec
heavy_compute/base                                       1.00    354.1±2.85µs        ? ?/sec      1.00    354.9±3.51µs        ? ?/sec
iter_fragmented/base                                     1.02   350.5±57.21ns        ? ?/sec      1.00   343.8±13.77ns        ? ?/sec
iter_fragmented/foreach                                  1.02   246.3±24.38ns        ? ?/sec      1.00   241.7±19.65ns        ? ?/sec
iter_fragmented/foreach_wide                             1.03      4.0±0.29µs        ? ?/sec      1.00      3.9±0.13µs        ? ?/sec
iter_fragmented/wide                                     1.00      4.5±0.13µs        ? ?/sec      1.01      4.6±0.21µs        ? ?/sec
iter_fragmented_sparse/base                              1.00     10.4±0.63ns        ? ?/sec      1.09     11.4±2.23ns        ? ?/sec
iter_fragmented_sparse/foreach                           1.00      9.2±0.39ns        ? ?/sec      1.13     10.4±2.43ns        ? ?/sec
iter_fragmented_sparse/foreach_wide                      1.01     42.3±1.98ns        ? ?/sec      1.00     42.0±2.10ns        ? ?/sec
iter_fragmented_sparse/wide                              1.00     52.1±4.85ns        ? ?/sec      1.00     52.3±2.69ns        ? ?/sec
iter_simple/base                                         1.00     11.0±0.33µs        ? ?/sec      1.01     11.1±1.17µs        ? ?/sec
iter_simple/foreach                                      1.01     10.9±0.03µs        ? ?/sec      1.00     10.8±0.08µs        ? ?/sec
iter_simple/foreach_sparse_set                           1.01     42.4±1.66µs        ? ?/sec      1.00     42.1±0.34µs        ? ?/sec
iter_simple/foreach_wide                                 1.03     47.4±0.30µs        ? ?/sec      1.00     45.9±1.64µs        ? ?/sec
iter_simple/foreach_wide_sparse_set                      1.00    232.4±5.91µs        ? ?/sec      1.01    234.2±2.10µs        ? ?/sec
iter_simple/sparse_set                                   1.03     52.3±1.98µs        ? ?/sec      1.00     50.7±0.32µs        ? ?/sec
iter_simple/system                                       1.01     11.1±0.43µs        ? ?/sec      1.00     11.0±0.04µs        ? ?/sec
iter_simple/wide                                         1.11     58.4±0.54µs        ? ?/sec      1.00     52.8±0.78µs        ? ?/sec
iter_simple/wide_sparse_set                              1.00    235.9±3.71µs        ? ?/sec      1.01    237.3±0.97µs        ? ?/sec
query_get/50000_entities_sparse                          1.01   724.6±38.37µs        ? ?/sec      1.00   717.9±72.84µs        ? ?/sec
query_get/50000_entities_table                           1.02   507.6±36.54µs        ? ?/sec      1.00   499.2±14.69µs        ? ?/sec
query_get_component/50000_entities_sparse                1.09  1280.3±137.26µs        ? ?/sec     1.00  1176.3±21.57µs        ? ?/sec
query_get_component/50000_entities_table                 1.27  1411.2±113.85µs        ? ?/sec     1.00  1112.3±49.62µs        ? ?/sec
query_get_component_simple/system                        1.01   774.5±65.15µs        ? ?/sec      1.00   764.4±25.69µs        ? ?/sec
query_get_component_simple/unchecked                     1.01   974.9±21.21µs        ? ?/sec      1.00   962.8±13.91µs        ? ?/sec
world_entity/50000_entities                              1.00    424.0±0.35µs        ? ?/sec      1.01    427.8±2.00µs        ? ?/sec
world_get/50000_entities_sparse                          1.00    560.7±2.54µs        ? ?/sec      1.00    563.1±4.53µs        ? ?/sec
world_get/50000_entities_table                           1.00   922.5±13.24µs        ? ?/sec      1.01   929.7±27.49µs        ? ?/sec
world_query_for_each/50000_entities_sparse               1.00     84.2±0.87µs        ? ?/sec      1.14     96.4±2.37µs        ? ?/sec
world_query_for_each/50000_entities_table                1.00     27.2±0.11µs        ? ?/sec      1.00     27.2±0.05µs        ? ?/sec
world_query_get/50000_entities_sparse                    1.00    463.4±1.03µs        ? ?/sec      1.00    463.5±1.49µs        ? ?/sec
world_query_get/50000_entities_sparse_wide               1.03  1425.0±13.95µs        ? ?/sec      1.00  1386.5±10.26µs        ? ?/sec
world_query_get/50000_entities_table                     1.00    273.3±3.24µs        ? ?/sec      1.00    272.8±1.27µs        ? ?/sec
world_query_get/50000_entities_table_wide                1.00   802.6±12.17µs        ? ?/sec      1.06   849.9±47.03µs        ? ?/sec
world_query_iter/50000_entities_sparse                   1.01     96.7±1.01µs        ? ?/sec      1.00     95.7±1.16µs        ? ?/sec
world_query_iter/50000_entities_table                    1.00     27.2±0.04µs        ? ?/sec      1.00     27.2±0.08µs        ? ?/sec

github-merge-queue bot pushed a commit that referenced this pull request Sep 27, 2024
…#15283)

## Objective

- Adopted #6396

## Solution

Same as #6396, we use a compile-time checked `StorageSwitch` union type
to select the fetch data based on the component's storage type, saving
>= 8 bytes per component fetch in a given query.

Note: We forego the Query iteration change as it exists in a slightly
different form now on main.

## Testing

- All current tests pass locally.

---------

Co-authored-by: james7132 <[email protected]>
Co-authored-by: Chris Russell <[email protected]>
robtfm pushed a commit to robtfm/bevy that referenced this pull request Oct 4, 2024
…bevyengine#15283)

## Objective

- Adopted bevyengine#6396

## Solution

Same as bevyengine#6396, we use a compile-time checked `StorageSwitch` union type
to select the fetch data based on the component's storage type, saving
>= 8 bytes per component fetch in a given query.

Note: We forego the Query iteration change as it exists in a slightly
different form now on main.

## Testing

- All current tests pass locally.

---------

Co-authored-by: james7132 <[email protected]>
Co-authored-by: Chris Russell <[email protected]>
@bas-ie
Copy link
Contributor

bas-ie commented Oct 6, 2024

Backlog cleanup: closing in favour of adopting PR #15283.

@bas-ie bas-ie closed this Oct 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-ECS Entities, components, systems, and events C-Performance A change motivated by improving speed, memory usage or compile times
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants