-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bug fix: timing regressions in Cluster.JoinAsync
methods
#6471
bug fix: timing regressions in Cluster.JoinAsync
methods
#6471
Conversation
We used the `cluster.seed-node-timeout` property incorrectly in akkadotnet#6033 - that is a "how long did it take from me to hear back from a seed" setting, but we're using it like a "how much time do I have to join the cluster?" setting. Added a function designed to give us at least 20s of leeway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Detailed changes
src/core/Akka.Cluster/Cluster.cs
Outdated
CancellationTokenSource timeoutCts = null; | ||
|
||
// if user has a CTS, it is the single source of truth - no second-guessing. | ||
if (token == default) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't second-guess the user - if they have a CancellationToken
, let that be the sole determinant of whether or not the operation succeeded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, yes, we need to second guess the user CancellationToken
because it has other use other than providing a timeout value and it would change the method behaviour.
src/core/Akka.Cluster/Cluster.cs
Outdated
|
||
RegisterOnMemberUp(() => | ||
{ | ||
timeoutCts.Dispose(); | ||
timeoutCts?.Dispose(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may not have a timeoutCts
- null-check it here.
/// <summary> | ||
/// Want at least 10-20 seconds of leeway here. | ||
/// </summary> | ||
private TimeSpan ComputeJoinTimeLimit() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
5s, the currently used value, is unrealistically short - 10s-20s is more appropriate.
Changes
We used the
cluster.seed-node-timeout
property incorrectly in #6033 - that is a "how long did it take from me to hear back from a seed" setting, but we're using it like a "how much time do I have to join the cluster?" setting.Added a function designed to give us at least 20s of leeway if the user does not provide a
CancellationToken
of their own.If the user does provide their own
CancellationToken
, that should be the single source of truth for this operation - second-guessing it without own un-overridable timeout adds zero value.Checklist
For significant changes, please ensure that the following have been completed (delete if not relevant):