Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define InstalledScheduler::wait_for_termination() #33922

Merged
merged 3 commits into from
Oct 31, 2023

Conversation

ryoqun
Copy link
Member

@ryoqun ryoqun commented Oct 29, 2023

Problem

The upcoming unified scheduler is async by nature. so, the replay stage must eventually be blocked on block boundaries in some way for tx execution completion. however, the minimal InstalledSchduler (introduced at #33875) doesn't provide any such way.

Summary of Changes

Implement it and wire it exhaustively into all the needed relevant blocking points.

(fyi, planned next prs: 1. introduce InstallSchedulerPool 2. introduce SchedulerPool as its actual impl)

extracted from: #33070

@@ -4185,7 +4186,11 @@ impl Bank {
/// Register a new recent blockhash in the bank's recent blockhash queue. Called when a bank
/// reaches its max tick height. Can be called by tests to get new blockhashes for transaction
/// processing without advancing to a new bank slot.
pub fn register_recent_blockhash(&self, blockhash: &Hash) {
fn register_recent_blockhash(&self, blockhash: &Hash, scheduler: &InstalledSchedulerRwLock) {
Copy link
Member Author

@ryoqun ryoqun Oct 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pub isn't needed anymore as of #33880..

also, this internal fn should ever be called by register_tick. even more so after this new scheduler arg. in other words, blocking must be done if needed with proper scheduler plumbing.

.as_mut()
.and_then(|scheduler| scheduler.wait_for_termination(&reason));
if !reason.is_paused() {
drop(scheduler.take().expect("scheduler after waiting"));
Copy link
Member Author

@ryoqun ryoqun Oct 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this meaningless drop() and .take() will be replaced with SchedulerPool::return_to_pool() by a later pr.

@ryoqun ryoqun requested a review from apfitzge October 29, 2023 07:20
Comment on lines +4190 to +4192
// This is needed because recent_blockhash updates necessitate synchronizations for
// consistent tx check_age handling.
BankWithScheduler::wait_for_paused_scheduler(self, scheduler);
Copy link
Member Author

@ryoqun ryoqun Oct 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is indeed very subtle yet crucial. there's tests for this: test_scheduler_schedule_execution_recent_blockhash_edge_case_{with,without}_race in #33070

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I'm trying to just summarize the reason this is necessary.

Essentially, in the loop in blcokstore_processor.rs, scheduling no longer blocks our transaction processing. This means we will end up scheduling until we hit the final tick, and which point we register the new bank hash.

We need to make sure the blockhash_queue of our bank is not updated until after all the scheduled transactions have completed, because if a tx w/ recent blockhash equal to the current blockhash was scheduled we'd get inconsistent results based on when it was executed not scheduled.

I think that is correct, but please let me know if it is not.
If that is the case, why did we choose this approach instead of checking blockhash at schedule time instead?
afaik blockhash queue is only updated at the end of a block, i.e. when we should not be scheduling any more txs for that slot.

Copy link
Member Author

@ryoqun ryoqun Oct 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Essentially, in the loop in blcokstore_processor.rs, scheduling no longer blocks our transaction processing. This means we will end up scheduling until we hit the final tick, and which point we register the new bank hash.

We need to make sure the blockhash_queue of our bank is not updated until after all the scheduled transactions have completed, because if a tx w/ recent blockhash equal to the current blockhash was scheduled we'd get inconsistent results based on when it was executed not scheduled.

that's correct understanding.

If that is the case, why did we choose this approach instead of checking blockhash at schedule time instead?
afaik blockhash queue is only updated at the end of a block, i.e. when we should not be scheduling any more txs for that slot.

there's two reasons:

  • impl simplify and risk reduction (i.e. extracting Bank::check_age() is too dangerous without some rigourious code audit, which i want to avoid (esp considering nonces). Certainly, this is nice optimization opportunity. maybe future work).
  • the recent blockhashes are exposed to runtime as a sysvar (SysvarRecentB1ockHashes11111111111111111111) in theory. In theory here means this isn't the case due to the existence of SysvarCache, which is updated only at the start of slot. However, I don't want to rely on some peculiar behavior of the cache not to be fragile. Again, this is fixable, but this will be of (distant) future work).

Alternatively, we can route the recentblockhash update via the scheduler like this:

enum SchedulerTask {
  ExecuteTransaction(txes...),
  UpdateRecentBlockhash(hash),
}

so that this particular blocking call can be consolidated into replaystage's wait_for_completed_scheduler(). but again, this is out-of-scope for now due to the need of auditing any potential affected access timings to the bank's recent blockhashes and its implications.

@codecov
Copy link

codecov bot commented Oct 29, 2023

Codecov Report

Merging #33922 (507c502) into master (cdc2841) will decrease coverage by 0.1%.
The diff coverage is 82.7%.

@@            Coverage Diff            @@
##           master   #33922     +/-   ##
=========================================
- Coverage    81.9%    81.9%   -0.1%     
=========================================
  Files         809      809             
  Lines      218060   218241    +181     
=========================================
+ Hits       178630   178769    +139     
- Misses      39430    39472     +42     

runtime/src/installed_scheduler_pool.rs Show resolved Hide resolved
runtime/src/installed_scheduler_pool.rs Show resolved Hide resolved
runtime/src/installed_scheduler_pool.rs Outdated Show resolved Hide resolved
pub type ResultWithTimings = (Result<()>, ExecuteTimings);

#[derive(Debug, PartialEq, Eq, Clone, Copy)]
pub enum WaitReason {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also comment on this from the call-side perspective? It's unclear to me, when I call wait_for_terminate with one of these reasons...is that me telling the scheduler to terminate with that reason, or is it just waiting for the reason to happen?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did my best: 507c502

@ryoqun
Copy link
Member Author

ryoqun commented Oct 30, 2023

@apfitzge thanks for pointing out right questions. hope my newly added in-source comments make your confusion vanish...

Copy link
Contributor

@apfitzge apfitzge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy with the changes. Tests are very clear and easy to follow with the mocked scheduler behavior.

@ryoqun ryoqun merged commit 136ab21 into solana-labs:master Oct 31, 2023
43 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants