-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add switch threshold local cluster tests + fix check_switch_threshold
#10343
Conversation
let alive_stake_1 = total_alive_stake / 2; | ||
let alive_stake_2 = total_alive_stake - alive_stake_1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aeyakovenko, @sakridge I think this test highlights a potential liveness problem caused by the interaction of the fork choice rule and the switching threshold (you can adjust the max_failures
stake and the distribution of alive_stake_1
and alive_stake_2
to test the below example).
Recall the current behavior when a validator on fork A
detects a heavier fork B
, the validator will
- Stop voting on its own fork
A
(to prevent liveness issues where you can never generate a switching proof b/c your own vote is a moving target) - Continue generating blocks on its own fork
A
- Wait for a switching proof to fork
B
, at which point it will vote and reset toB
Now consider the below example, where the switching threshold is 38%
(need >38%
to switch). In the example, a partition occurs such that:
- One partition has
<=38%
of the stake but is heavier (it has a lucky run of slots) - One partition has
>62%
but is lighter
4- 5- 6- 7-8-9-10 (38%)
/
1----2
\
3-11 (62%)
In the above example, during the partition there was a brief run where the 38%
partition got more blocks.
Then when the partitions resolve, partition 2
will stop voting and wait for a switching proof to the heavier partition 1
(code here: https://github.com/solana-labs/solana/blob/master/core/src/replay_stage.rs#L1451-L1459) but 1
will never have enough stake for a switching proof).
Note that even though partition 2
will eventually generate more blocks than partition 1
, validators on partition 2
are not voting, so the weight doesn't catch up to partition 1
(original design was based on fact that fork with more stake/slots would eventually catch up due to generating more blocks, but this reasoning seems flawed) because partitioin 1
is both generating blocks and voting (adding a voting bank increases fork weight by a factor of 2^n, simply adding unvoted banks only increases the fork weight linearly)
Ideally we would want to adjust the fork choice rule such that 38%
switch over to the 62%
fork because it has more of the most recent stake voting/building on it, even though our current fork weight may say otherwise.
For example, what if we tracked the most recent
votes landed by each validator, and updated these as validators voted on blocks on same/different forks? Then when considering two forks, traverse the tree, picking the fork that has the greatest amount of most recent
stake voting on top of it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea, i see. What's weird is that 38% fork can construct a switching proof to the 62% fork, but its lighter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think the accumulation of weight ignores the expiration of votes. that's part of the problem. because eventually the 38% fork would expire all their votes and the 62 will catch up.
Codecov Report
@@ Coverage Diff @@
## master #10343 +/- ##
========================================
Coverage 81.2% 81.2%
========================================
Files 288 288
Lines 67093 67171 +78
========================================
+ Hits 54481 54596 +115
+ Misses 12612 12575 -37 |
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. |
This stale pull request has been automatically closed. Thank you for your contributions. |
Problem
BankForks.ancestors()
can now return withkeys < root
.Summary of Changes
check_switch_threshold
checks, update unit testsFixes #