Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance Leader Rotation Logic to Address Edge Cases in Leader Selection #4798

Merged
merged 11 commits into from
Nov 27, 2024

Conversation

GheisMohammadi
Copy link
Contributor

@GheisMohammadi GheisMohammadi commented Nov 12, 2024

Issue

This PR refines the leader rotation logic in NthNextValidator to handle previously unaddressed edge cases that could lead to unexpected behavior or repeated selection of the same leader. In the refactored function for leader rotation, several corner cases were addressed to ensure robustness in the selection of the next validator.

One critical improvement was handling cases where next is greater than zero. In the original code, if the list wraps around from the last validator back to the beginning, there was a risk that the first validator could be skipped entirely. This is especially problematic when the validator list is circular, as it could lead to unfair distribution in leader selection. The refactored function avoids this by introducing attempts as a counter that methodically cycles through all nodes in publicKeys, ensuring no validator is skipped due to wrap-around.

Another addressed case involves scenarios where next is zero, yet the pubKey of the current leader isn't found within publicKeys. In the original code, if the pubKey index was -1 (indicating it wasn’t present), this situation could cause a crash when trying to access an invalid index. The refactored version improves safety by first logging this absence as an error and, if next is zero, returning the first public key in the list as a fallback. This default ensures stability, allowing the function to handle missing keys gracefully without interrupting leader selection.

Finally, the revised code prevents repeatedly selecting the same leader when a different validator isn’t readily available. In the original function, if the loop failed to find a unique validator, it would repeatedly return the same pubKey, potentially leading to repetitive selection cycles. The refactored version adds a check to ensure that only a distinct validator is returned. Additionally, a limit on attempts guarantees that the function terminates if no unique validator is found after a full cycle, logging a warning as an indication of this fallback behavior. This effectively prevents inefficient leader selection and ensures smoother rotations.

This PR introduces changes to the leader rotation logic that require a hard fork (HF) for full network-wide consistency. Because this update impacts how validators are selected, implementing it without a hard fork would lead to differing leader rotations among nodes running different versions of the code, potentially causing consensus issues.

For now, the hard fork epoch has been set to "TBD" (to be determined) to allow further discussion and coordination among stakeholders. This placeholder will be updated once the team agrees on an appropriate epoch for deployment.

This PR addresses 3 corner cases mentioned in issue #4796

@GheisMohammadi GheisMohammadi self-assigned this Nov 12, 2024
@sophoah sophoah requested a review from Frozen November 19, 2024 07:48
@sophoah
Copy link
Contributor

sophoah commented Nov 19, 2024

@GheisMohammadi please check the travis failure

@Frozen
Copy link
Contributor

Frozen commented Nov 21, 2024

New functionality contains 0 tests coverage.

@sophoah
Copy link
Contributor

sophoah commented Nov 21, 2024

New functionality contains 0 tests coverage.

Good point though it's not a new feature. Is it possible to write test cases here cc @GheisMohammadi

@GheisMohammadi
Copy link
Contributor Author

New functionality contains 0 tests coverage.

Good point though it's not a new feature. Is it possible to write test cases here cc @GheisMohammadi

I added tests for three edge cases to demonstrate how Leader Rotation v1 fails to handle them. Additionally, I created the same tests for Leader Rotation v2 to validate that it successfully addresses all three edge cases.

Copy link
Contributor

@sophoah sophoah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving this one but we'll need a different iteration of NthNextValidatorV2 as discussed in the protocol team meeting

@sophoah
Copy link
Contributor

sophoah commented Nov 22, 2024

@Frozen please review/comment/approve

@sophoah sophoah merged commit 6e7b891 into dev Nov 27, 2024
4 checks passed
@sophoah sophoah deleted the hf/leader_rotation_v2 branch November 27, 2024 04:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants