Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add test in banking stage for recording txs in last tick #34635

Merged
merged 3 commits into from
Jan 5, 2024

Conversation

jstarry
Copy link
Member

@jstarry jstarry commented Jan 3, 2024

Problem

There's a potential race condition in nonce handling due to the fact that the bank's blockhash queue gets updated before all transactions are committed. This race condition isn't under test but should be.

Summary of Changes

  • Add test to ensure that invalid nonce transactions are advanced properly during the last tick of a block
  • Add a comment explaining why the blockhash has to be fetched before recording transactions to poh

Fixes #

@jstarry jstarry force-pushed the test/check-last-blockhash-race branch from 1c62197 to de70de9 Compare January 3, 2024 03:14
core/src/banking_stage/consumer.rs Show resolved Hide resolved
core/src/banking_stage/consumer.rs Outdated Show resolved Hide resolved
core/src/banking_stage/consumer.rs Outdated Show resolved Hide resolved
core/src/banking_stage/consumer.rs Outdated Show resolved Hide resolved
Comment on lines +1160 to +1165
let record_response = poh_recorder.write().unwrap().record(
record.slot,
record.mixin,
record.transactions,
);
poh_recorder.write().unwrap().tick();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor
Would it make sense to lock the poh_recorder only once?

Suggested change
let record_response = poh_recorder.write().unwrap().record(
record.slot,
record.mixin,
record.transactions,
);
poh_recorder.write().unwrap().tick();
let record_response = {
let poh_recorder = poh_recorder.write().unwrap();
let response = poh_recorder.record(
record.slot,
record.mixin,
record.transactions,
);
poh_recorder.tick();
response
}:

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normally yes, but given this is test code, I passed on this

Comment on lines +1166 to +1168
if record.sender.send(record_response).is_err() {
panic!("Error returning mixin hash");
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor
While it is unlikely that this will fail, could the error message contain any useful information?

Suggested change
if record.sender.send(record_response).is_err() {
panic!("Error returning mixin hash");
}
assert_eq!(
record.sender.send(record_response),
Ok(()),
"Error returning mixin hash"
}

core/src/banking_stage/consumer.rs Outdated Show resolved Hide resolved
assert_eq!(transactions_attempted_execution_count, 1);
assert_eq!(executed_transactions_count, 1);
assert_eq!(executed_with_successful_result_count, 0);
assert!(commit_transactions_result.is_ok());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor
Using assert_eq! would show the error content in case of a failure:

Suggested change
assert!(commit_transactions_result.is_ok());
assert_eq!(commit_transactions_result, Ok(()));

Or, if the successful result is more complex:

use assert_matches::assert_matches;

            /* ... */

            assert_matches!(commit_transactions_result, Ok(_));

Comment on lines 1224 to 1235
let mut done = false;
// read entries until I find mine, might be ticks...
while let Ok((_bank, (entry, _tick_height))) = entry_receiver.recv() {
if !entry.is_tick() {
assert_eq!(entry.transactions.len(), transactions.len());
done = true;
}
if done {
break;
}
}
assert!(done);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bikeshedding

Suggested change
let mut done = false;
// read entries until I find mine, might be ticks...
while let Ok((_bank, (entry, _tick_height))) = entry_receiver.recv() {
if !entry.is_tick() {
assert_eq!(entry.transactions.len(), transactions.len());
done = true;
}
if done {
break;
}
}
assert!(done);
let my_enrty = entry_receiver
.iter()
.map(|(_bank, (entry, _tick_height))| entry)
.filter(|entry| !entry.is_tick())
.first()
.expect("A non-tick entry was inserted [where?]");
assert_eq!(my_entry.transactions.len(), transactions.len());

Maybe there is a better name than my_entry? But I do not fully understand the code to suggset it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks through the entries of a block to find one that has transactions in it.. since the test only sends one transaction, it's enough to know that the transaction was included. It's not really necessary to have this check because there are other checks for whether the tx was committed but it's here because I copied it from another test. It's not really important though, I think it's fine to leave as is.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it is fine to leave it as is.
Though, I think, the version I wrote is functionally identical to the one in the PR.
If by "this check" you mean the .expect("A non-tick entry was inserted [where?]"); - existing version also has this check. It is the assert!(done) at the end.
I just thought, an explicit explanation is almost always better.

core/src/banking_stage/consumer.rs Show resolved Hide resolved
@jstarry
Copy link
Member Author

jstarry commented Jan 3, 2024

Thanks for the review @ilya-bobyr, I applied some of your suggestions. The others seemed too minor given that this is test code. Did you have any substantial feedback about what is being tested? The purpose of this change is to prevent race conditions so I'd appreciate if you comment on that rather than giving me so many minor suggestions.

Comment on lines +1129 to +1131
let ledger_path = get_tmp_ledger_path_auto_delete!();
{
let blockstore = Blockstore::open(ledger_path.path())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This block is covering most of the test body.
Is it because you want the blockstore to be destructed before the

Blockstore::destroy(ledger_path.path()).unwrap();

call?

While generally blocks are a better way to scope object lifetimes, when they are this long, it becomes harder to track the dependencies.
Maybe remove it and add an explicit drop() just before the destroy() call:

drop(blockstore);
Blockstore::destroy(ledger_path.path()).unwrap();

let recorder = poh_recorder.new_recorder();
let poh_recorder = Arc::new(RwLock::new(poh_recorder));

fn poh_tick_before_returning_record_response(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While this function is very important to this test, it does not have direct dependencies on the test content.
In other words, it can be moved to the module level, and be a helper function, similar to the store_nonce_account().
Doing so, I think, could make the test shorter and, potentially, easier to read.

If you want to keep it internal to the test, it can also be moved outside the block that constraints the lifetime of the blockstore.

@ilya-bobyr
Copy link
Contributor

Thanks for the review @ilya-bobyr, I applied some of your suggestions. The others seemed too minor given that this is test code. Did you have any substantial feedback about what is being tested? The purpose of this change is to prevent race conditions so I'd appreciate if you comment on that rather than giving me so many minor suggestions.

To me, the test itself is very reasonable :)
Though, I may not understand all the details as well as, say, Andrew or other reviewers you have added.
Unless you want this in ASAP, maybe also wait for an approval from one of them.

They only argument for the test is that it is a bit long.
So, I think, possibly putting a bit of effort into making it shorter could be helpful.
I have left a comment that one of the helper functions could be moved out of the test.
I guess the rest of the setup is specialized to this test?
If not, using a shared setup functions could make it more obvious which part is really the "meat" of the test, and which is just a necessary scaffolding.

Copy link

codecov bot commented Jan 3, 2024

Codecov Report

Attention: 2 lines in your changes are missing coverage. Please review.

Comparison is base (c1c857c) 81.8% compared to head (c178593) 81.8%.
Report is 30 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff            @@
##           master   #34635    +/-   ##
========================================
  Coverage    81.8%    81.8%            
========================================
  Files         824      824            
  Lines      222260   222362   +102     
========================================
+ Hits       181885   181996   +111     
+ Misses      40375    40366     -9     

t-nelson
t-nelson previously approved these changes Jan 4, 2024
Copy link
Contributor

@t-nelson t-nelson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks logically good to me. left a couple suggestions to hopefully better guide anyone who breaks the test

..
} = create_slow_genesis_config(10_000);
let bank = Bank::new_no_wallclock_throttle_for_tests(&genesis_config).0;
let pubkey = solana_sdk::pubkey::new_rand();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prefer Pubkey::new_unique()

.store(true, Ordering::Relaxed);
let _ = poh_simulator.join();

// check that the nonce was advanced to the current bank's last blockhash
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// check that the nonce was advanced to the current bank's last blockhash
// check that the nonce was advanced to the current bank's last blockhash
// rather than the current bank's blockhash as would occur had the update
// blockhash been queried _after_ transaction recording

@@ -1065,6 +1092,161 @@ mod tests {
Blockstore::destroy(ledger_path.path()).unwrap();
}

#[test]
fn test_bank_process_and_record_transactions_in_last_tick() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we name the test to more clearly state that it is enforcing the nonce update blockhash query ordering rather than a more generic process that could have other side effects that we're not testing here? something like...

Suggested change
fn test_bank_process_and_record_transactions_in_last_tick() {
fn test_bank_nonce_update_blockhash_queried_before_transaction_record() {

apfitzge
apfitzge previously approved these changes Jan 4, 2024
Copy link
Contributor

@apfitzge apfitzge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No major comments from me, and no minors except what others have already mentioned.

@jstarry jstarry dismissed stale reviews from apfitzge and t-nelson via 2c21d17 January 5, 2024 08:33
@jstarry jstarry force-pushed the test/check-last-blockhash-race branch from 2c21d17 to c178593 Compare January 5, 2024 08:38
@jstarry
Copy link
Member Author

jstarry commented Jan 5, 2024

Great, updated with feedback, one more approval please!

Copy link
Contributor

@t-nelson t-nelson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@jstarry jstarry merged commit 47b4075 into solana-labs:master Jan 5, 2024
35 checks passed
@jstarry jstarry deleted the test/check-last-blockhash-race branch January 5, 2024 16:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants