Abort query if not enough stake is connected #3345

yacovm · 2024-08-28T22:41:29Z

This pull request makes the snowman engine to abort early and avoid sending a query if the node considers the connected stake as insufficient to achieve a successful poll.

The rational for this, is to prevent the node to try to agree on blocks when it is in a partition.

The intent here is twofold: (1) To prevent needless resource expenditure, and (2) to prevent snowball from increasing its preference strength while the protocol has no way to make progress.

The higher preference strength in a partition, the longer it takes to recover and achieve consensus once the partition heals.

Why this should be merged

Without this change, nodes in a partition consistently try to poll each other and depending on the configuration, it may be that all nodes in a partition manage to increase the preference strength over time in snowball (but fail to finalize blocks, because they cannot consistently achieve the required confidence thresholds).
Once the nodes have increased the preference strength of the next block they prefer, even if the partition heals, it will take a long time for the nodes to change their preference, and the consensus will stall for a long time or until a restart of the nodes.

With this change, the nodes will avoid sending polls once they detect they are in a partition, and thus the preference strength of snowball won't increase needlessly.

How this works

Before sending a query, a node checks if it senses enough stake is reachable, and aborts early if it is not the case.

How this was tested

I ran a Fuji node with verbose logging level, and blocked its communication to 33% of the stake via iptables rules.
Then, without this change, the node still tries to query other nodes.
However, with this change, the node aborts and does not query anymore:

Instead, it prints that it failed fast and aborted the query:

I also did a more comprehensive test:

I deployed a 4 node subnet on Fuji (testnet) and then setup a network partition where two nodes are on one side and the other two nodes are on the other.
Then I submitted transactions to some nodes on each side of the partition, and monitored a custom metric I added which measures the preference strength of snowball. Afterwards, I let the partition heal and attempted to submit more transactions to the network.

I repeated the experiment above for both a build from master and a build based on this fix.

In the build based on master, i observed the snowball preference strength go wild, and the network was not functional after the partition healed:

In the build based on this fix, the metric didn't change as the queries are not being sent in the first place.
After the partition healed, conflicting blocks were rejected as consensus was re-established:

snow/engine/snowman/engine_test.go

snow/engine/snowman/engine.go

snow/engine/snowman/engine_test.go

This commit makes the snowman engine to abort early and avoid sending a query if the node considers the connected stake as insufficient to achieve a successful poll. The rational for this, is to prevent the node to try to agree on blocks when it is in a partition. The intent here is twofold: (1) To prevent needless resource expenditure, and (2) to prevent snowball from increasing its preference strength while the protocol has no way to make progress. The higher preference strength in a partition, the longer it takes to recover and achieve consensus once the partition heals. Signed-off-by: Yacov Manevich <[email protected]>

Signed-off-by: Yacov Manevich <[email protected]>

yacovm requested a review from StephenButtolph as a code owner August 28, 2024 22:41

yacovm marked this pull request as draft August 28, 2024 22:41

yacovm force-pushed the throttlePolls branch 3 times, most recently from f795532 to 5ad5305 Compare August 30, 2024 00:20

yacovm changed the title ~~Throttle polls~~ Abort query if not enough stake is connected Aug 30, 2024

yacovm force-pushed the throttlePolls branch from 5ad5305 to ca6ea16 Compare August 30, 2024 00:33

yacovm marked this pull request as ready for review August 30, 2024 00:34

aaronbuchwald reviewed Aug 30, 2024

View reviewed changes

snow/engine/snowman/engine_test.go Outdated Show resolved Hide resolved

yacovm force-pushed the throttlePolls branch from ca6ea16 to ab364ae Compare August 30, 2024 13:49

aaronbuchwald reviewed Aug 30, 2024

View reviewed changes

snow/engine/snowman/engine_test.go Outdated Show resolved Hide resolved

yacovm force-pushed the throttlePolls branch from ab364ae to 3fe75a9 Compare August 30, 2024 16:30

aaronbuchwald approved these changes Aug 30, 2024

View reviewed changes

yacovm force-pushed the throttlePolls branch from 3fe75a9 to e4a44d1 Compare August 30, 2024 16:36

yacovm self-assigned this Aug 30, 2024

yacovm force-pushed the throttlePolls branch from e4a44d1 to 08d50d1 Compare September 3, 2024 21:35

marun reviewed Sep 4, 2024

View reviewed changes

snow/engine/snowman/engine_test.go Outdated Show resolved Hide resolved

yacovm force-pushed the throttlePolls branch 2 times, most recently from 9d7193a to 19a7551 Compare September 4, 2024 17:45

marun approved these changes Sep 4, 2024

View reviewed changes

yacovm force-pushed the throttlePolls branch 3 times, most recently from 31ce708 to 86d4b0c Compare September 5, 2024 17:04

StephenButtolph reviewed Sep 5, 2024

View reviewed changes

snow/engine/snowman/engine.go Show resolved Hide resolved

snow/engine/snowman/engine.go Outdated Show resolved Hide resolved

snow/engine/snowman/engine_test.go Show resolved Hide resolved

yacovm force-pushed the throttlePolls branch 2 times, most recently from a922d0f to 1529dde Compare September 5, 2024 19:07

yacovm force-pushed the throttlePolls branch from 1529dde to f20879e Compare September 5, 2024 19:26

StephenButtolph approved these changes Sep 6, 2024

View reviewed changes

StephenButtolph added this pull request to the merge queue Sep 6, 2024

StephenButtolph added this to the v1.11.12 milestone Sep 6, 2024

StephenButtolph added consensus This involves consensus incident response labels Sep 6, 2024

Merged via the queue into ava-labs:master with commit 6e1a905 Sep 6, 2024
21 checks passed

michaelkaplan13 pushed a commit that referenced this pull request Sep 11, 2024

Abort query if not enough stake is connected (#3345)

df48206

Signed-off-by: Yacov Manevich <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Abort query if not enough stake is connected #3345

Abort query if not enough stake is connected #3345

yacovm commented Aug 28, 2024 •

edited

Loading

Abort query if not enough stake is connected #3345

Abort query if not enough stake is connected #3345

Conversation

yacovm commented Aug 28, 2024 • edited Loading

Why this should be merged

How this works

How this was tested

yacovm commented Aug 28, 2024 •

edited

Loading