Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change Blockstore max_root from RwLock<Slot> to AtomicU64 #33998

Merged
merged 8 commits into from
Nov 10, 2023

Conversation

steviez
Copy link
Contributor

@steviez steviez commented Nov 9, 2023

Problem

The Blockstore currently maintains a RwLock<Slot> of the maximum root it has seen inserted. The value is initialized during Blockstore::open() and updated during calls to Blockstore::set_roots(). Accessing this RwLock is cheaper than constructing an iterator through the roots column every time the maximum root value is desired. The maximum root value is queried fairly often:

  • Shred ingestion checks the parent slot against the max root; slots with a parent older than the most recent root can immediately be discarded as not being on the main fork
  • RPC calls want to know the max root for findings slots with confirmation level confirmed

However, the access pattens of the max slot mirror that of an atomic, and the RwLock is overkill. Namely,

  • All reads simply get a read lock to extract the inner value
  • The only write lock holds so it can fetch a value, do some operations, and then store a new value

Summary of Changes

  • Rename last_root field to max_root
  • Switch max_root from a RwLock<Slot> to AtomicU64

This change is fairly small, but things are still broken up fairly well by commit for easier review

ledger/src/blockstore.rs Show resolved Hide resolved
@@ -1192,7 +1182,7 @@ impl Blockstore {
return false;
}

if !Blockstore::should_insert_coding_shred(&shred, &self.last_root) {
if !Blockstore::should_insert_coding_shred(&shred, self.max_root()) {
Copy link
Contributor Author

@steviez steviez Nov 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The former RwLock / now AtomicU64 is getting fetched for every single shred that is inserted. I was thinking we could fetch this value once at the start of the loop that iterates through every shred in Blockstore::do_insert_shreds(). Looking at metrics for my node for the last 24 hours, I'm seeing something like 4k shreds per second (blockstore-insert-shreds.num_shreds is around 8k but divide by 2 since the metric is reported every 2s).

However, glancing at recv-window-insert-shreds.run_insert_count, I see a value on the order of 1000's which would indicate that on average, we're calling Blockstore::insert_shreds() with a small number of shreds (ie 8k shreds from above / 4000 calls = 2 shred/call). So, the gains of reading once at the start may not be so big (factor of 2), but wondering if having so many calls with so few shreds per call is intended / optimal behavior.

Probably inclined to leave this alone for now and to revisit later when/if investigating why so few shreds per each call to Blockstore::insert_shreds(). From pure efficiency standpoint, coalescing shreds would amortize the overhead incurred in insert_shreds(). However, shred ingestion directly feeds replay, and any increases to latency here would likely be deemed unacceptable

Copy link

codecov bot commented Nov 9, 2023

Codecov Report

Merging #33998 (323a398) into master (67f8daf) will decrease coverage by 0.1%.
Report is 4 commits behind head on master.
The diff coverage is 97.2%.

@@            Coverage Diff            @@
##           master   #33998     +/-   ##
=========================================
- Coverage    81.9%    81.9%   -0.1%     
=========================================
  Files         811      811             
  Lines      219412   219407      -5     
=========================================
- Hits       179771   179750     -21     
- Misses      39641    39657     +16     

@steviez steviez marked this pull request as ready for review November 9, 2023 18:33
Copy link
Contributor

@CriesofCarrots CriesofCarrots left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, this turned out to be quite straightforward! r+ restoring pub fn last_root()

ledger/src/blockstore.rs Show resolved Hide resolved
CriesofCarrots
CriesofCarrots previously approved these changes Nov 10, 2023
The max_root RxLock had the previous workload
- The single writer holds the write lock just long enough to update
- The readers do not hold the lock; rather, they hold it just long
  enough to read the value out

Thus, the access patterns do not require this to be a lock, and we can
use a simple atomic
I'm slightly dubious of the test given that it "rolls back" a root, this
can't happen.
@steviez
Copy link
Contributor Author

steviez commented Nov 10, 2023

@CriesofCarrots - Can I get another ship-it; I had to rebase to resolve a conflict from the other PR where I ripped out that test-only function that allowed for manual modification of the previous RwLock<u64> / now AtomicU64

@behzadnouri
Copy link
Contributor

should the rename be separated out into its own PR?

@steviez
Copy link
Contributor Author

steviez commented Nov 10, 2023

should the rename be separated out into its own PR?

Arguably it could be; however, between the total diff not being overly large and me having intermediate commits for rename / updating callers / change to Atomic / etc, I feel like this is fine to leave as one.

@steviez steviez merged commit b91da22 into solana-labs:master Nov 10, 2023
32 checks passed
@steviez steviez deleted the bstore_max_root branch November 10, 2023 23:27
debug_assert_matches!(shred.sanitize(), Ok(()));
shred.is_code() && shred.slot() > *last_root.read().unwrap()
shred.is_code() && shred.slot() > max_root
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a slight change of behavior here, because previously the code was fetching the most recent last_root by passing the &RwLock<u64> directly here, but now the max_root is loaded earlier in the call stack and can be stalled and not be equal to self.max_root() anymore.

This can potentially result in more unnecessary processing of old shreds, but I don't know how much impact it has.

Is there a downside to pass &AtomicU64 here and do the load here as opposed to earlier in the call-tack?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but now the max_root is loaded earlier in the call stack and can be stalled and not be equal to self.max_root() anymore.

This can potentially result in more unnecessary processing of old shreds, but I don't know how much impact it has.

Blockstore::check_insert_coding_shred() is the caller and the new location of self.max_root():

if !Blockstore::should_insert_coding_shred(&shred, self.max_root()) {

And here is the full implementation of should_insert_coding_shred():

fn should_insert_coding_shred(shred: &Shred, max_root: Slot) -> bool {
debug_assert_matches!(shred.sanitize(), Ok(()));
shred.is_code() && shred.slot() > max_root
}

So, the change is that self.max_root() is getting called prior to shred.is_code() && shred.slot():
-shred.is_code() is a simple equality check on shred type

  • shred.slot() just reference the slot from the shred's common header.

So, while it is getting called earlier in the callstack, the amount earlier is these two minimal operations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants