-
Notifications
You must be signed in to change notification settings - Fork 674
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Abort query if not enough stake is connected #3345
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
yacovm
force-pushed
the
throttlePolls
branch
3 times, most recently
from
August 30, 2024 00:20
f795532
to
5ad5305
Compare
aaronbuchwald
approved these changes
Aug 30, 2024
yacovm
force-pushed
the
throttlePolls
branch
from
September 3, 2024 21:35
e4a44d1
to
08d50d1
Compare
marun
reviewed
Sep 4, 2024
yacovm
force-pushed
the
throttlePolls
branch
2 times, most recently
from
September 4, 2024 17:45
9d7193a
to
19a7551
Compare
marun
approved these changes
Sep 4, 2024
yacovm
force-pushed
the
throttlePolls
branch
3 times, most recently
from
September 5, 2024 17:04
31ce708
to
86d4b0c
Compare
yacovm
force-pushed
the
throttlePolls
branch
2 times, most recently
from
September 5, 2024 19:07
a922d0f
to
1529dde
Compare
This commit makes the snowman engine to abort early and avoid sending a query if the node considers the connected stake as insufficient to achieve a successful poll. The rational for this, is to prevent the node to try to agree on blocks when it is in a partition. The intent here is twofold: (1) To prevent needless resource expenditure, and (2) to prevent snowball from increasing its preference strength while the protocol has no way to make progress. The higher preference strength in a partition, the longer it takes to recover and achieve consensus once the partition heals. Signed-off-by: Yacov Manevich <[email protected]>
yacovm
force-pushed
the
throttlePolls
branch
from
September 5, 2024 19:26
1529dde
to
f20879e
Compare
StephenButtolph
approved these changes
Sep 6, 2024
michaelkaplan13
pushed a commit
that referenced
this pull request
Sep 11, 2024
Signed-off-by: Yacov Manevich <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This pull request makes the snowman engine to abort early and avoid sending a query if the node considers the connected stake as insufficient to achieve a successful poll.
The rational for this, is to prevent the node to try to agree on blocks when it is in a partition.
The intent here is twofold: (1) To prevent needless resource expenditure, and (2) to prevent snowball from increasing its preference strength while the protocol has no way to make progress.
The higher preference strength in a partition, the longer it takes to recover and achieve consensus once the partition heals.
Why this should be merged
Without this change, nodes in a partition consistently try to poll each other and depending on the configuration, it may be that all nodes in a partition manage to increase the preference strength over time in snowball (but fail to finalize blocks, because they cannot consistently achieve the required confidence thresholds).
Once the nodes have increased the preference strength of the next block they prefer, even if the partition heals, it will take a long time for the nodes to change their preference, and the consensus will stall for a long time or until a restart of the nodes.
With this change, the nodes will avoid sending polls once they detect they are in a partition, and thus the preference strength of snowball won't increase needlessly.
How this works
Before sending a query, a node checks if it senses enough stake is reachable, and aborts early if it is not the case.
How this was tested
I ran a Fuji node with verbose logging level, and blocked its communication to 33% of the stake via
iptables
rules.Then, without this change, the node still tries to query other nodes.
However, with this change, the node aborts and does not query anymore:
Instead, it prints that it failed fast and aborted the query:
I also did a more comprehensive test:
I deployed a 4 node subnet on Fuji (testnet) and then setup a network partition where two nodes are on one side and the other two nodes are on the other.
Then I submitted transactions to some nodes on each side of the partition, and monitored a custom metric I added which measures the preference strength of snowball. Afterwards, I let the partition heal and attempted to submit more transactions to the network.
I repeated the experiment above for both a build from master and a build based on this fix.
In the build based on master, i observed the snowball preference strength go wild, and the network was not functional after the partition healed:
In the build based on this fix, the metric didn't change as the queries are not being sent in the first place.
After the partition healed, conflicting blocks were rejected as consensus was re-established: