Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SharedCache] Split state into initial, loaded, and modified #6393

Draft
wants to merge 1 commit into
base: dev
Choose a base branch
from

Conversation

bdash
Copy link
Contributor

@bdash bdash commented Feb 5, 2025

NOTE: I'm sending this as a draft for sake of discussion. This is complete, but would ideally be split into smaller changes for sake of easier review. I've had it in my tree for six weeks now and haven't found the energy to split it up. I figured I'd check if there are any major objections to this approach before I spend the time trying to split it up.

The bottom line here is that this change:

  1. Avoids the performance hit from repeatedly copying SharedCache's state.
  2. Avoids the performance hit from repeatedly serializing SharedCache's state to JSON.

The end result is a > 2x performance improvement on loading images from the shared cache, with a clear path towards safely loading images in parallel.

@0cyn @WeiN76LQh


The initial state, CacheInfo, is initialized during PerformInitialLoad and is immutable after that point. This required some slight restructuring of how information about memory regions is tracked as that was previously modified as regions were loaded. Memory regions are now stored in a map from their address range to the MemoryRegion object. This makes it cheap to look them up by address which is a common operation.

The modified state, ModifiedState, consists of changes since the last save to the DSCView / ViewSpecificState. This means it is no longer necessary to copy any state when mutating a SharedCache instance for the first time. Instead, its data structures start off empty and are populated as images, sections, or symbol information is loaded.

The loaded state, State, consists of all modified state that has since been saved. It lives on the ViewSpecificState. Saving modified state merges it into the existing loaded state. State and ModifiedState are almost identical in what they contain, but are separate types to make it clearer which contains all state vs just modifications.

This pattern is carried over to the Metadata stored on the DSCView. The initial state is stored under its own metadata key, and each modified state is stored under a key with an incrementing number. This means each save of the state only needs to serialize the state that changed, rather than reserializing all of the state all of the time.

There are two huge benefits from these changes:

  1. At no point does SharedCache have to copy its in memory state. The basic copy-on-write approach introduced in [SharedCache] Use basic copy-on-write for viewStateCache #6129 reduced how often these copies are made, but they're still frequent and very expensive. This eliminates the 🐄.
  2. At no point does SharedCache have to re-serialize state to JSON that it has already serialized. JSON serialization previously added hundreds of milliseconds to any mutating operation on SharedCache.

As a result, this speeds up the initial load of the shared cache by around 2x and loading of subsequent images improves by about the same.

One trade-off is that the serialization / deserialization logic is more complicated. There are two reasons for this:

  1. The state is now split across multiple metadata keys and needs to be merged when it is loaded.
  2. The in-memory representation uses pointers to identify memory regions. These relationships have to be re-established after the JSON is deserialized.

As a future direction it is worth considering whether the logic owned by SharedCache could be split in a similar manner to the data. The initial loading of the cache header, loading of images, and handling of symbol information are all mostly independent and work on separate data. If the logic were split into separate classes it would be easier to reason about which data is valid when, and would easily permit concurrent loading of multiple images from the shared library in a thread-safe manner.

The initial state is initialized during `PerformInitialLoad` and is
immutable after that point. This required some slight restructuring of
how information about memory regions is tracked as that was previously
modified as regions were loaded. Memory regions are now stored in a map
from their address range to the `MemoryRegion` object. This makes it
cheap to look them up by address which is a common operation.

The modified state consists of changes since the last save to the
`DSCView` / `ViewSpecificState`. This means it is no longer necessary to
copy any state when mutating a `SharedCache` instance for the first
time. Instead, its data structures start off empty and are populated as
images, sections, or symbol information is loaded.

The loaded state consists of all modified state that has since been
saved. It lives on the `ViewSpecificState`. Saving modified state
merges it into the the existing loaded state.

This pattern is carried over to the `Metadata` stored on the `DSCView`.
The initial state is stored under its own metadata key, and each
modified state is stored under a key with an incrementing number. This
means each save of the state only needs to serialize the state that
changed, rather than reserializing all of the state all of the time.

There are two huge benefits from these changes:
1. At no point does `SharedCache` have to copy its in memory state.
   The basic copy-on-write approach introduced in Vector35#6129 reduced how
   often these copies are made, but they're still frequent and very
   expensive.
1. At no point does `SharedCache` have to re-serialize state to JSON
   that it has already serialized. JSON serialization previously added
   hundreds of milliseconds to any mutating operation on `SharedCache`.

As a result, this speeds up the initial load of the shared cache by
around 2x and loading of subsequent images improves by about the same.

One trade-off is that the serialization / deserialization logic is more
complicated. There are two reasons for this:
1. The state is now split across multiple metadata keys and needs to be
   merged when it is loaded.
2. The in-memory representation uses pointers to identify memory regions.
   These relationships have to be re-established after the JSON is
   deserialized.

As a future direction it is worth considering whether the logic owned by
`SharedCache` could be split in a similar manner to the data. The
initial loading of the cache header, loading of images, and handling of
symbol information are all mostly independent and work on separate data.
If the logic were split into separate classes it would be easier to
reason about which data is valid when, and would easily permit
concurrent loading of multiple images from the shared library in a
thread-safe manner.
@plafosse plafosse requested a review from 0cyn February 5, 2025 13:45
@0cyn
Copy link
Member

0cyn commented Feb 5, 2025

I'm a pretty big fan of everything this proposes including the next steps re: splitting the logic into separate classes.

@WeiN76LQh
Copy link

Sounds great and certainly the way things needed to go.

This is more for the future; I haven't had time to dive into the actual commit so this may not be an issue but when it comes to concurrent loading of images I think further changes will be required. An issue I've had there is the UI blocking for a long time. I came to the realization that whilst it may be a Binary Ninja issue, I think the solution is to be using the bulk add segments/memory region APIs to prevent the UI from trying to update an unnecessary number of times during concurrent loading. Whether that will prevent the UI from locking up entirely or not, I don't know but it should help as far as I can tell. The problem is that I believe it will require doing a little bit of re-writing of how images are loaded because sections can't be added until the end of the bulk add API is called. So concurrent loading of the segments/memory regions would have to occur first, then the adding of sections and post processing of them could be done. Thats what I gathered when I tried to wrap concurrent image loading with bulk add segments/memory region API calls.

@bdash
Copy link
Contributor Author

bdash commented Feb 6, 2025

In the short term this makes concurrent loading slightly less practical as it uses a single lock within SharedCache to protect all modified state. Access to the initial state (CacheInfo) is not guarded as it is immutable before it is visible to multiple threads.

As we've talked about, the existing locking is inconsistent and has correctness issues. Some of these are a result of my change that introduced COW. I felt it was best to make the locking clearly correct for now with the goal of moving to finer-grained locking when functionality is split out of SharedCache.

A next step to support concurrent image loading could be to move responsibility for loading a single image (and the state associated with that) into a separate type. SharedCache could then enforce that only one caller can attempt to load a given image at a time. This should permit concurrent image loading while keeping it easy to reason about thread safety.

If adding segments and sections proves to be a bottleneck then batching of mutations could be accommodated by splitting the responsibility slightly further. The image loader could populate state local to the shared cache plug-in, with a separate step handling applying the results of completed image loads to the view. Honestly, that'd probably be a win for clarity as well.

That said, I'm not sure whether adding segments and sections is still a bottleneck after the recent changes to how SharedCache interacts with the memory map (af7715d). That change significantly reduced the amount of time spent in memory map code in the profiles I've looked at.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants