-
Notifications
You must be signed in to change notification settings - Fork 455
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Only trigger bootstrap on topology change if node receives new shards #1868
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🏃💨
I don't quite get this "Node A completes bootstrapping first. It then waits for a snapshot to complete post-bootstrap (to ensure durability). Once it has ensured durability, it marks its shards as available which propagates a topology update to nodes B and C." If it's not receiving any shards, won't it not cause a topology update? |
@robskillington Basically every topology change (including marking shards as available) triggers a bootstrap on every single node in the cluster with the existing code path |
cfa0f0c
to
0052fb2
Compare
Codecov Report
@@ Coverage Diff @@
## master #1868 +/- ##
=========================================
+ Coverage 63.4% 65% +1.6%
=========================================
Files 1113 986 -127
Lines 104789 83990 -20799
=========================================
- Hits 66458 54622 -11836
+ Misses 34087 25484 -8603
+ Partials 4244 3884 -360
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
What this PR does / why we need it:
Currently a bootstrap will be triggered for each topology change, even if the node doesn't receive any new shards. These bootstraps are generally short-lived (since their is nothing to bootstrap) but they make the nodes flicker in and out of the "bootstrapped" state during topology changes which is confusing.
In addition, the fact that the nodes are flickering in and out of the "bootstrapped" state significantly delays the amount of time until a node appears "BootstrappedAndDurable". This is most pronounced during node removes where basically every node in the cluster is accepting a few new shards. Imagine a 4 node cluster where one node is removed and as a result nodes A,B, and C all receive some of node D's shards.
Node A completes bootstrapping first. It then waits for a snapshot to complete post-bootstrap (to ensure durability). Once it has ensured durability, it marks its shards as available which propagates a topology update to nodes B and C.
In the meantime, nodes B and C had also completed bootstrapping and were waiting for their post-bootstrap snapshot to complete, however, because they just received a topology update they will start a new bootstrap. This bootstrap will end very quickly (because their is no new data they need to acquire since they didn't get any new shards) but it will reset their "wait for a snapshot to complete post bootstrap" durability timer. Eventually node B or C will mark its shards as available and trigger the same issue for the other node.
In clusters with many nodes, the actual data streaming to remove a node may complete in 20 minutes but the nodes can spend hours and hours just interrupting each other durability timers until the topology change finally completes.
This P.R addresses this issue by only triggering bootstrap if their is a need to bootstrap (which is when new shards are acquired).
Special notes for your reviewer:
There is no additional tests because the desired behavior of being able to bootstrap all data from disk post-topology change is already well covered by the
TestClusterAddOneNodeCommitlog
integration test. As long as that test continues to pass then our logic for maintaining durability during topology changes is sound.Does this PR introduce a user-facing and/or backwards incompatible change?:
None
Does this PR require updating code package or user-facing documentation?:
None