-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Purge TransactionStatus and AddressSignatures exactly from ledger-tool #10358
Purge TransactionStatus and AddressSignatures exactly from ledger-tool #10358
Conversation
This may be because compaction is happening while you're purging, do the speeds look a lot better if you turn off automatic compaction? |
} | ||
let slot_meta = slot_meta.unwrap(); | ||
|
||
let entries: Vec<Vec<Entry>> = PAR_THREAD_POOL_ALL_CPUS.with(|thread_pool| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can extract out this par_iter
logic into a function that takes a thread pool argument, so it can be reused within get_slot_entries_with_shred_info
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This turned out to be a bit of a pain, since we return a Result
in one case and just the Vec
in another. Okay to leave as-is?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah no worries
|
||
let entries: Vec<Vec<Entry>> = PAR_THREAD_POOL_ALL_CPUS.with(|thread_pool| { | ||
thread_pool.borrow().install(|| { | ||
completed_ranges |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm out of curiosity, how big are these ranges?
I think it'll be interesting to see if we get better parallel perf by splitting up the work of chunks of slots here: https://github.com/solana-labs/solana/pull/10358/files#diff-9ed453cbc107691c26b7e060d05e2ab0R1875 among a thread pool, instead of using the thread pool for individual shred ranges as we currently have it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On MNB, completed_ranges.len() varies from ~80-150. Mostly 100+
Could we try splitting my chunks of slots in a later optimization?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah yeah if the ranges are that large, you probably won't be getting much parallel work as that all goes to one thread
As to the optimization, yeah for sure, I was just curious
09fe097
to
07a04ec
Compare
Codecov Report
@@ Coverage Diff @@
## master #10358 +/- ##
========================================
Coverage 81.4% 81.4%
========================================
Files 289 289
Lines 67315 67450 +135
========================================
+ Hits 54811 54925 +114
- Misses 12504 12525 +21 |
@@ -0,0 +1,1111 @@ | |||
use super::*; | |||
|
|||
impl Blockstore { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yay thanks for the refactor!
#10358) * Add failing test * Add execution path to purge primary-index columns exactly * Fail gracefully if older TransactionStatus rocksdb keys are present * Remove columns_empty check for special columns * Move blockstore purge methods to submodule * Remove unused column empty check (cherry picked from commit eee9a08)
Problem
The primary-index-based purge mechanism for the TransactionStatus and AddressSignatures columns is based on the assumption that all purging will happen at the back of the ledger, from the lowest slots onward, as performed by the ledger-cleanup-service. However, solana-ledger-tool allows purging of any slot range, and as our recent mainnet-beta reboot demonstrates, there may be important times we need to purge at the front of the ledger. As the code is currently written, such purges could result in unintended loss of data from the TransactionStatus and AddressSignatures columns.
Summary of Changes
solana-ledger-tool purge
. This results in significantly slower purge operations, but bumps the number of threads being used to help offset.Benchmarked on a current mainnet-beta api node:
With this code, each deletion of 1000 slots at current mainnet-beta transaction rates takes ~6sec (as compared to ~100us).
10_000 slots:
deletion - 39sec (vs 2ms)
compaction - 6m20sec (vs 3m20sec)
100_000 slots:
deletion - 16m17sec (vs 30ms)
compaction - 4m6sec (vs 1m32sec) ... I don't really understand why the purge was faster for 100k slots than 10k
One option to make the longer time-to-purge less painful in dev contexts would be to add an arg to
solana-ledger-tool purge
disabling purging of the TransactionStatus and AddressSignatures column families.