-
Notifications
You must be signed in to change notification settings - Fork 290
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enhance Leader Rotation Logic to Address Edge Cases in Leader Selection #4798
Conversation
@GheisMohammadi please check the travis failure |
83cb640
to
d5d8358
Compare
New functionality contains 0 tests coverage. |
Good point though it's not a new feature. Is it possible to write test cases here cc @GheisMohammadi |
…ow it skips some validators
I added tests for three edge cases to demonstrate how |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving this one but we'll need a different iteration of NthNextValidatorV2 as discussed in the protocol team meeting
@Frozen please review/comment/approve |
Issue
This PR refines the leader rotation logic in
NthNextValidator
to handle previously unaddressed edge cases that could lead to unexpected behavior or repeated selection of the same leader. In the refactored function for leader rotation, several corner cases were addressed to ensure robustness in the selection of the next validator.One critical improvement was handling cases where
next
is greater than zero. In the original code, if the list wraps around from the last validator back to the beginning, there was a risk that the first validator could be skipped entirely. This is especially problematic when the validator list is circular, as it could lead to unfair distribution in leader selection. The refactored function avoids this by introducing attempts as a counter that methodically cycles through all nodes inpublicKeys
, ensuring no validator is skipped due to wrap-around.Another addressed case involves scenarios where
next
is zero, yet thepubKey
of the current leader isn't found withinpublicKeys
. In the original code, if thepubKey
index was -1 (indicating it wasn’t present), this situation could cause a crash when trying to access an invalid index. The refactored version improves safety by first logging this absence as an error and, ifnext
is zero, returning the first public key in the list as a fallback. This default ensures stability, allowing the function to handle missing keys gracefully without interrupting leader selection.Finally, the revised code prevents repeatedly selecting the same leader when a different validator isn’t readily available. In the original function, if the loop failed to find a unique validator, it would repeatedly return the same
pubKey
, potentially leading to repetitive selection cycles. The refactored version adds a check to ensure that only a distinct validator is returned. Additionally, a limit on attempts guarantees that the function terminates if no unique validator is found after a full cycle, logging a warning as an indication of this fallback behavior. This effectively prevents inefficient leader selection and ensures smoother rotations.This PR introduces changes to the leader rotation logic that require a hard fork (HF) for full network-wide consistency. Because this update impacts how validators are selected, implementing it without a hard fork would lead to differing leader rotations among nodes running different versions of the code, potentially causing consensus issues.
For now, the hard fork epoch has been set to "TBD" (to be determined) to allow further discussion and coordination among stakeholders. This placeholder will be updated once the team agrees on an appropriate epoch for deployment.
This PR addresses 3 corner cases mentioned in issue #4796