Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add additional vote lockout stake threshold #34120

Merged
merged 2 commits into from
Dec 12, 2023

Conversation

bw-solana
Copy link
Contributor

Problem

We see several cases where cluster gets partitioned in such a way that some set of validators end up voting down the minority fork and hit deep lockouts (up to 8). This can stall OC for over a minute in the worst cases. See #34107 for one example.

It would be good to resolve all the reasons why validators can vote down the minority fork (deterministic fork selection, fixing reasons why leader's broadcast gets delayed, etc.) but it could take some time to resolve all of these. Treating the symptom by reducing how far down the minority fork validators will vote can mitigate lockout period significantly.

Summary of Changes

When checking vote stake threshold as part of voting criteria, check for both of the following criteria:

  1. 8th deep tower vote is at OC (66%) <-- this is existing check
  2. 4th deep tower vote is at switch threshold (38%) <-- new check being added

The reason switch threshold is selected for the threshold is because at that point the voting validator can be confident that either:

  1. The rest of the cluster is seeing something different from what I'm seeing
  2. The rest of the cluster just needs to land their votes or switch over to my fork

We often see cases where the validators that voted on the "majority" fork weren't able to land their votes because they got sent to a leader building off the "minority" fork. However, the minority fork has less than 38% and so those validators aren't allowed to switch over. This is the exact situation this new criteria should prevent because either:

  1. The minority fork is at less than 33%, validators will stop voting at 4 deep, and thus the lockout will only be ~6 seconds
  2. The "minority" fork is at >=38% and the validators on the "majority" fork will actually switch over (they won't know they were actually on the majority fork had all the votes landed).

Copy link

codecov bot commented Nov 16, 2023

Codecov Report

Merging #34120 (3b9c8e7) into master (71c1782) will decrease coverage by 0.1%.
The diff coverage is 96.6%.

Additional details and impacted files
@@            Coverage Diff            @@
##           master   #34120     +/-   ##
=========================================
- Coverage    81.9%    81.9%   -0.1%     
=========================================
  Files         819      819             
  Lines      219884   219919     +35     
=========================================
+ Hits       180263   180270      +7     
- Misses      39621    39649     +28     

Copy link
Contributor

@AshwinSekar AshwinSekar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might want to add some tests that pass the 8 deep check but fail the 4 deep check.

core/src/consensus.rs Outdated Show resolved Hide resolved
core/src/consensus.rs Outdated Show resolved Hide resolved
core/src/consensus.rs Outdated Show resolved Hide resolved
if vote.confirmation_count() as usize > threshold_depth {
for old_vote in &vote_state.votes {
if old_vote.slot() == vote.slot()
&& old_vote.confirmation_count() == vote.confirmation_count()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@carllin do you remember why we do this? I found #7948 but since voted_stakes is rolled up across forks it shouldn't matter right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this was the issue: #7904 (comment)

If you bounce back to voting on the main fork after not voting for a while, your latest vote N on the main fork will pop off a lot of the stake of votes in the tower that would have rolled up to earlier votes in the tower. I think this is because for every vote state we apply the proposed vote N latest before roling up to voted_stakes

@bw-solana
Copy link
Contributor Author

might want to add some tests that pass the 8 deep check but fail the 4 deep check.

Added 3 tests:

  1. Check the case where we're above both thresholds
  2. Check the case where we're above threshold 8 deep but not 4 deep
  3. Check the case where we're above threshold 4 deep but not 8 deep

Copy link
Contributor

@AshwinSekar AshwinSekar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me but i'll wait for carl to weigh in.
A couple of things:

  • I think we should test this out with the delay active.
  • Although not strictly necessary we might want to feature flag this as it changes voting behavior. perhaps running with the delay and a 50/50 split with the new threshold turned on could give us some insight on whether it's necessary.

core/src/consensus.rs Outdated Show resolved Hide resolved
pub fn check_vote_stake_threshold(
/// Checks a single vote threshold for `slot`
fn check_vote_stake_threshold(
vote: Option<&Lockout>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit maybe rename to threshold_vote so that it's not implied that the vote referenced by vote_state_before_applying_vote is this one

fork_stake,
total_stake
);
if vote.confirmation_count() as usize > threshold_depth {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just realized that at this point t if vote is Some it has to be greater than threshold_depth, but makes sense to minimize changes here

if vote.confirmation_count() as usize > threshold_depth {
for old_vote in &vote_state.votes {
if old_vote.slot() == vote.slot()
&& old_vote.confirmation_count() == vote.confirmation_count()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this was the issue: #7904 (comment)

If you bounce back to voting on the main fork after not voting for a while, your latest vote N on the main fork will pop off a lot of the stake of votes in the tower that would have rolled up to earlier votes in the tower. I think this is because for every vote state we apply the proposed vote N latest before roling up to voted_stakes

@carllin
Copy link
Contributor

carllin commented Nov 17, 2023

we might want to feature flag this as it changes voting behavior.

Definitely seems like something we should feature flag because I don't think it's something that can be adequately tested by slow rollout, i.e. the percentage of people following one threshold or the other probably matters

We should also spin up a cluster with ~20 nodes and run the partition tests with 50-50 split and 30-70 split. I think those are generally good at finding any consensus issues

@bw-solana
Copy link
Contributor Author

Added feature gate to control checking this additional threshold: #34158

@diman-io
Copy link
Contributor

I ❤️ this. I fully support it.
In my own research, I also came to the conclusion that there should be multiple levels of decision-making depending on the observed state of the cluster.

I've been referring to this concept as Votes Funnel, drawing an analogy with Sales Funnel. However, Forks Funnel might be a more accurate term. 😃

carllin
carllin previously approved these changes Nov 19, 2023
core/src/consensus.rs Outdated Show resolved Hide resolved
@@ -732,6 +732,10 @@ pub mod enable_zk_transfer_with_fee {
solana_sdk::declare_id!("zkNLP7EQALfC1TYeB3biDU7akDckj8iPkvh9y2Mt2K3");
}

pub mod additional_vote_stake_threshold {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these changes shouldn't need a feature gate. the fd team and operators would i'm sure be happy if we didn't break consensus unnecessarily

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the feature gate

@bw-solana
Copy link
Contributor Author

Addressed latest feedback to re-order stake threshold checks and remove the feature gate.

Changes have been tested against (1) 10 node cluster partition testing w/ 50/50 and 70/30 stake split and (2) 10 node cluster w/ broadcast delay. Cluster seems to be able to recover from partitions lasting up to several minutes. Lockouts also appear to be limited to 4-deep in the broadcast delay case.

Copy link
Contributor

@AshwinSekar AshwinSekar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll await Carl's final approval, but the successful results observed in the test clusters gives me high confidence about the change

@bw-solana bw-solana merged commit 07f3883 into solana-labs:master Dec 12, 2023
33 checks passed
@bw-solana
Copy link
Contributor Author

Following up on this: Looked at tower height metrics on testnet over the last few days (2/25/24 to 2/27/24) and things are looking much better than they did a few months back.

I only saw a couple of places where tower height rose to ~80 and neither of them appeared to be due to 8-deep lockout issues (seemed like some short lived partitioning combined w/ delinquent leaders).

We previously were seeing several second long root stalls a few times every day, so it seems like the additional check is helping 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants