Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize allocation patterns in item and elsewhere #70423

Merged
merged 12 commits into from
Jan 6, 2024

Conversation

akrieger
Copy link
Member

@akrieger akrieger commented Dec 24, 2023

Summary

Performance "Optimizing mostly item to reduce load-time allocation by over 20%"

Purpose of change

Memory profiling cdda is extremely painful because just getting through load can take a stupid amount of time. Reducing the number of allocations done during load can help with that specifically, but also generally optimizing for allocations is beneficial for performance.

Describe the solution

There's a few things going on here.

  • item contains several std:: container types. These types are not all noexcept movable, so item can't be noexcept movable either. This means containers of items like vectors have to copy items on move, instead of efficiently moving them, which is a horrendous waste of allocations. Introduce a heap<> template wrapper which we can use to put these nested containers into the heap. Although this is another level of indirection, these aren't hotly accessed members, so the penalty is negligible. But, by wrapping the std containers in the heap<> type, we can make item be noexcept movable, which is vastly more performant. This by itself eliminates many millions of allocations.
  • Secondarily, because the heap<> type operates as unique_ptr on move, we don't pay the penalty of reinitializing the containers when they are moved-from. These reinitializations are typically wasted allocations that are immediately freed. This does mean a moved-from item is definitely not reusable anymore, but the typical pattern is moved-from objects aren't reusable anyway.
  • We can save a few more millions of allocations by introducing an 'externally allocating' version of utf8_display_split. The original creates a separate string per non-zero-width utf8 character in mapgen pallettes, afaict. We can instead write a version that pushes string_views into a by-ref vector argument. That vector of string_views can then be statically allocated in mapgen and cached across calls, resulting in effectively constant allocation space. (Alternatively, using a cata::small_literal_vector to stack allocate may work because string_view is trivial and qualifies for that helper class).
  • Apply lazy<> to a few more members, such as field::_field_type_list and, notably, item's safe_reference_anchor member, also saves noticable amounts of allocs.

Describe alternatives you've considered

Testing

Use a gui based heapprof like program to instrument the game loading.

Before:

before_allocs

before_allocs_detail

After:

after_allocs

after_allocs_detail

The number is 'number of allocations made', as in, calls to malloc, not bytes allocated. Most of the savings are in Item_factory::check_definitions

Additional context

@github-actions github-actions bot added Map / Mapgen Overmap, Mapgen, Map extras, Map display [C++] Changes (can be) made in C++. Previously named `Code` Fields / Furniture / Terrain / Traps Objects that are part of the map or its features. Code: Performance Performance boosting code (CPU, memory, etc.) labels Dec 24, 2023
@akrieger
Copy link
Member Author

Great, I managed to make every compiler mad at me somehow.

@github-actions github-actions bot added the astyled astyled PR, label is assigned by github actions label Dec 24, 2023
src/value_ptr.h Outdated Show resolved Hide resolved
src/value_ptr.h Outdated Show resolved Hide resolved
@akrieger akrieger force-pushed the itemizing_item_optimizing branch from 8216003 to ec501ee Compare December 26, 2023 20:44
@github-actions github-actions bot added the json-styled JSON lint passed, label assigned by github actions label Dec 26, 2023
src/field.cpp Show resolved Hide resolved
@github-actions github-actions bot added the BasicBuildPassed This PR builds correctly, label assigned by github actions label Dec 26, 2023
@Maleclypse
Copy link
Member

Clang has many errors. Do they belong to you?

@akrieger
Copy link
Member Author

Yep I made it mad. I'll have to fix.

@akrieger
Copy link
Member Author

akrieger commented Jan 6, 2024

Hm all my stuff together, not separately, seems to be hitting an issue. Let me verify I didn't mess up a merge somewhere.

@akrieger akrieger marked this pull request as draft January 6, 2024 00:47
@akrieger
Copy link
Member Author

akrieger commented Jan 6, 2024

Ok this is fine, I messed up a merge with #70685, so just merge this and when it's in master I'll manually rebase #70685 (easy to miss a ! to invert a bool).

@akrieger akrieger marked this pull request as ready for review January 6, 2024 01:05
@Maleclypse Maleclypse merged commit 8c5e33d into CleverRaven:master Jan 6, 2024
30 of 38 checks passed
@akrieger akrieger deleted the itemizing_item_optimizing branch January 6, 2024 17:56
@akrieger akrieger added the 0.H Backport PR to backport to the 0.H stable release canddiate label Jan 6, 2024
kevingranade added a commit that referenced this pull request May 14, 2024
@Procyonae Procyonae added 0.H Backported and removed 0.H Backport PR to backport to the 0.H stable release canddiate labels May 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0.H Backported astyled astyled PR, label is assigned by github actions BasicBuildPassed This PR builds correctly, label assigned by github actions [C++] Changes (can be) made in C++. Previously named `Code` Code: Performance Performance boosting code (CPU, memory, etc.) Fields / Furniture / Terrain / Traps Objects that are part of the map or its features. json-styled JSON lint passed, label assigned by github actions Map / Mapgen Overmap, Mapgen, Map extras, Map display
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants