Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shred Insertion Abstraction #3637

Merged
merged 6 commits into from
Dec 3, 2024

Conversation

bw-solana
Copy link

@bw-solana bw-solana commented Nov 14, 2024

Problem

do_insert_shreds is over 300 lines and very unwieldy.

Summary of Changes

Commits were made to try and make it easier to review in steps:

  1. Establish the ShredInsertionTracker struct to bundle together a bunch of HashMaps and Vecs and such that track updates.
  2. Abstract out shred insertion, recovery, and write commit steps into separate functions. Add 2 more items to ShredInsertionTracker that should've been added in step 1. Fix up misnamed measurement.
  3. Shuffle some code around to make abstractions and tracking cleaner.
  4. Feed ShredInsertionTracker into check_insert_data_shred. Extend newly completed slots within check_insert_data_shred
  5. Feed ShredInsertionTracker into check_insert_coding_shred.
  6. Update comments

@bw-solana bw-solana changed the title add tracker struct Shred Insertion Abstraction Nov 14, 2024
@bw-solana bw-solana marked this pull request as ready for review November 14, 2024 20:57
@bw-solana bw-solana requested review from cpubot and steviez November 14, 2024 20:57
@bw-solana bw-solana force-pushed the shred_insert_abstraction branch from 84766d2 to 41c02a9 Compare November 26, 2024 17:15
@bw-solana
Copy link
Author

@steviez rebased on top of your latest changes. Wasn't too bad really

Copy link

@steviez steviez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - this is a nice breakdown of the major "steps" of shred insertion

@bw-solana
Copy link
Author

@behzadnouri - Feel free to review everything if you'd like, but I'm specifically adding you to check this commit: c8dad38

I moved commit_slot_meta_working_set to happen after check_chained_merkle_root_consistency so that I could group all of the write batch updates together. I think this is fine but could use your review

@behzadnouri
Copy link

also cc @AshwinSekar to take a look.

I moved commit_slot_meta_working_set to happen after check_chained_merkle_root_consistency so that I could group all of the write batch updates together. I think this is fine but could use your review

Don't we want to perform consistency checks after all writes are committed? otherwise e.g. check_chained_merkle_root_consistency might want to read data which are not committed yet.

Also, I didn't understand why we need this specific reordering to achieve this:

group all of the write batch updates together

mind clarifying this further?

@AshwinSekar
Copy link

check_chained_merkle_root_consistency is fine to run before writes are committed:

  • It only runs the check if we've newly populated an ErasureMeta or MerkleRootMeta
  • This means we've received the first shred from the FEC set in this batch
  • As a result we use just_inserted_shreds rather than db lookup
  • When grabbing the shred from the next/previous FEC set to perform the check we already default to using just_inserted_shreds, and db as a backup
  • We don't read the slot meta

@bw-solana
Copy link
Author

Also, I didn't understand why we need this specific reordering to achieve this:

group all of the write batch updates together

mind clarifying this further?

this is purely for code readability purposes so we can create the commit_updates_to_write_batch function where all of the commits to the write batch (e.g. slot meta, erasure meta, merkle root meta, slot index) happen in one place.

@bw-solana
Copy link
Author

@behzadnouri - Any concerns with merging this in?

@behzadnouri
Copy link

@behzadnouri - Any concerns with merging this in?

The PR overall looks good and I like the reorganization of the code as the old code had become too hairy and error-prone.
My only hesitation is that, given that this is a big change with some re-ordering, it might introduce a subtle bug or race condition elsewhere, e.g. reading data which are not committed yet (or if such bugs are more likely in the future if someone unknowingly changes the code).

That said if you guys are confident and comfortable that this is not a concern, it is fine with me to merge the code.

Separately, one idea that might help to further reorganize this code is to move ShredInsertionTracker out of blockstore.rs and define shred insert functions as methods on ShredInsertionTracker passing Blockstore as an argument. That way ShredInsertionTracker would act like a state machine for the work that needs to be done while inserting shreds.
I don't know if that is practical or a good idea at all, so definitely out of scope for this pr.

@bw-solana
Copy link
Author

My only hesitation is that, given that this is a big change with some re-ordering, it might introduce a subtle bug or race condition elsewhere

I share the same concern. I've been running a canary on mainnet for a week or so and haven't observed any misbehavior, so I have some level of confidence here.

I like the idea of splitting out the tracker into its own file and building out more of an interface.

@bw-solana bw-solana merged commit fe2f5b3 into anza-xyz:master Dec 3, 2024
41 checks passed
@bw-solana bw-solana deleted the shred_insert_abstraction branch December 3, 2024 16:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants