Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Purge TransactionStatus and AddressSignatures exactly from ledger-tool #10358

Merged
merged 6 commits into from
Jun 3, 2020

Conversation

CriesofCarrots
Copy link
Contributor

Problem

The primary-index-based purge mechanism for the TransactionStatus and AddressSignatures columns is based on the assumption that all purging will happen at the back of the ledger, from the lowest slots onward, as performed by the ledger-cleanup-service. However, solana-ledger-tool allows purging of any slot range, and as our recent mainnet-beta reboot demonstrates, there may be important times we need to purge at the front of the ledger. As the code is currently written, such purges could result in unintended loss of data from the TransactionStatus and AddressSignatures columns.

Summary of Changes

  • Add a separate pathway to purge TransactionStatus and AddressSignatures columns exactly, and use in solana-ledger-tool purge. This results in significantly slower purge operations, but bumps the number of threads being used to help offset.
  • Also pulls in some blockstore refactoring from Swap index order in blockstore TransactionStatus column  #9038

Benchmarked on a current mainnet-beta api node:
With this code, each deletion of 1000 slots at current mainnet-beta transaction rates takes ~6sec (as compared to ~100us).

10_000 slots:
deletion - 39sec (vs 2ms)
compaction - 6m20sec (vs 3m20sec)

100_000 slots:
deletion - 16m17sec (vs 30ms)
compaction - 4m6sec (vs 1m32sec) ... I don't really understand why the purge was faster for 100k slots than 10k

One option to make the longer time-to-purge less painful in dev contexts would be to add an arg to solana-ledger-tool purge disabling purging of the TransactionStatus and AddressSignatures column families.

@carllin
Copy link
Contributor

carllin commented Jun 1, 2020

Problem

compaction - 4m6sec (vs 1m32sec) ... I don't really understand why the purge was faster for 100k slots than 10k

This may be because compaction is happening while you're purging, do the speeds look a lot better if you turn off automatic compaction?

ledger/src/blockstore.rs Outdated Show resolved Hide resolved
}
let slot_meta = slot_meta.unwrap();

let entries: Vec<Vec<Entry>> = PAR_THREAD_POOL_ALL_CPUS.with(|thread_pool| {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can extract out this par_iter logic into a function that takes a thread pool argument, so it can be reused within get_slot_entries_with_shred_info

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This turned out to be a bit of a pain, since we return a Result in one case and just the Vec in another. Okay to leave as-is?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah no worries


let entries: Vec<Vec<Entry>> = PAR_THREAD_POOL_ALL_CPUS.with(|thread_pool| {
thread_pool.borrow().install(|| {
completed_ranges
Copy link
Contributor

@carllin carllin Jun 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm out of curiosity, how big are these ranges?

I think it'll be interesting to see if we get better parallel perf by splitting up the work of chunks of slots here: https://github.com/solana-labs/solana/pull/10358/files#diff-9ed453cbc107691c26b7e060d05e2ab0R1875 among a thread pool, instead of using the thread pool for individual shred ranges as we currently have it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On MNB, completed_ranges.len() varies from ~80-150. Mostly 100+
Could we try splitting my chunks of slots in a later optimization?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah yeah if the ranges are that large, you probably won't be getting much parallel work as that all goes to one thread

As to the optimization, yeah for sure, I was just curious

@mvines mvines added the v1.2 label Jun 2, 2020
@codecov
Copy link

codecov bot commented Jun 3, 2020

Codecov Report

Merging #10358 into master will increase coverage by 0.0%.
The diff coverage is 93.4%.

@@           Coverage Diff            @@
##           master   #10358    +/-   ##
========================================
  Coverage    81.4%    81.4%            
========================================
  Files         289      289            
  Lines       67315    67450   +135     
========================================
+ Hits        54811    54925   +114     
- Misses      12504    12525    +21     

@@ -0,0 +1,1111 @@
use super::*;

impl Blockstore {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yay thanks for the refactor!

@CriesofCarrots CriesofCarrots merged commit eee9a08 into solana-labs:master Jun 3, 2020
mergify bot pushed a commit that referenced this pull request Jun 3, 2020
#10358)

* Add failing test

* Add execution path to purge primary-index columns exactly

* Fail gracefully if older TransactionStatus rocksdb keys are present

* Remove columns_empty check for special columns

* Move blockstore purge methods to submodule

* Remove unused column empty check

(cherry picked from commit eee9a08)
@CriesofCarrots CriesofCarrots deleted the exact-purge-deser branch July 24, 2020 22:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants