-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Failsafes to prevent a consensus round from taking too long #5277
base: develop
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #5277 +/- ##
=======================================
Coverage 78.1% 78.1%
=======================================
Files 790 790
Lines 67607 67623 +16
Branches 8164 8167 +3
=======================================
+ Hits 52828 52842 +14
- Misses 14779 14781 +2
|
newPosition = weight > p.avSTUCK_CONSENSUS_PCT; | ||
else | ||
newPosition = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is so simple that it's obviously correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has been rewritten a bit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still, the ending newPosition = false;
remained, and I like that.
src/xrpld/consensus/Consensus.cpp
Outdated
@@ -181,6 +181,12 @@ checkConsensus( | |||
return ConsensusState::MovedOn; | |||
} | |||
|
|||
if (currentAgreeTime > parms.ledgerMAX_CONSENSUS + previousAgreeTime) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that because of this condition here, we are unable to test the change in DisputedTx.h
- can you engineer timings such that we will test the last newPosition = false
in DisputedTx::updateVote
as well ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that because of this condition here, we are unable to test the change in
DisputedTx.h
- can you engineer timings such that we will test the lastnewPosition = false
inDisputedTx::updateVote
as well ?
This has been revised, too.
f07992d
to
76c27a0
Compare
… long - Require 100% agreement if the round takes 4x - Change all votes to "no" if the round takes 5x
* upstream/develop: Amendment `fixFrozenLPTokenTransfer` (5227) Improve git commit hash lookup (5225)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be cool to have a unit test for the last part of DisputedTx.h
; not sure how realistic that request is. Approved in any case.
* upstream/develop: Updates Conan dependencies (5256)
04187d0
to
7c5822b
Compare
afba742
to
61e12a2
Compare
* upstream/develop: fix: Do not allow creating Permissioned Domains if credentials are not enabled (5275) fix: issues in `simulate` RPC (5265)
High Level Overview of Change
This PR, if merged, introduces two fail safes into the consensus logic to prevent a consensus round from remaining open indefinitely.
ledgerMAX_CONSENSUS
(15 seconds) andledgerABANDON_CONSENSUS
(60 seconds). This prevents an unusually fast consensus round from being punished into aborting unusually early on the next round, and prevents the potential round time from growing without bound. i.e. If one round takes 60 seconds, we don't want to let the next round run for 10 minutes.Context of Change
At about 9:54pm UTC on 2/4/2025, the network successfully validated ledger 93927173, and started the consensus round for 93927174. That round did not end for over an hour.
The current evidence indicates that two things happened.
This led to a deadlock-like situation where every node was waiting for some other node to make a change, while none of the nodes were willing to change.
This decision algorithm has been in place for at least 8 years, and possibly since the first release of
rippled
. The odds of it happening were thought to be 0, but it turns out they're just very very small.Type of Change
This change is fully backward and forward compatible, and does not require an amendment.