Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug fix: timing regressions in Cluster.JoinAsync methods #6471

Merged
merged 5 commits into from
Feb 28, 2023

Conversation

Aaronontheweb
Copy link
Member

@Aaronontheweb Aaronontheweb commented Feb 28, 2023

Changes

We used the cluster.seed-node-timeout property incorrectly in #6033 - that is a "how long did it take from me to hear back from a seed" setting, but we're using it like a "how much time do I have to join the cluster?" setting.

Added a function designed to give us at least 20s of leeway if the user does not provide a CancellationToken of their own.

If the user does provide their own CancellationToken, that should be the single source of truth for this operation - second-guessing it without own un-overridable timeout adds zero value.

Checklist

For significant changes, please ensure that the following have been completed (delete if not relevant):

We used the `cluster.seed-node-timeout` property incorrectly in akkadotnet#6033 - that is a "how long did it take from me to hear back from a seed" setting, but we're using it like a "how much time do I have to join the cluster?" setting.

Added a function designed to give us at least 20s of leeway.
@Aaronontheweb Aaronontheweb added this to the 1.5.0 milestone Feb 28, 2023
Copy link
Member Author

@Aaronontheweb Aaronontheweb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Detailed changes

CancellationTokenSource timeoutCts = null;

// if user has a CTS, it is the single source of truth - no second-guessing.
if (token == default)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't second-guess the user - if they have a CancellationToken, let that be the sole determinant of whether or not the operation succeeded.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, yes, we need to second guess the user CancellationToken because it has other use other than providing a timeout value and it would change the method behaviour.


RegisterOnMemberUp(() =>
{
timeoutCts.Dispose();
timeoutCts?.Dispose();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may not have a timeoutCts - null-check it here.

/// <summary>
/// Want at least 10-20 seconds of leeway here.
/// </summary>
private TimeSpan ComputeJoinTimeLimit()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5s, the currently used value, is unrealistically short - 10s-20s is more appropriate.

@Aaronontheweb Aaronontheweb merged commit 1a63cd9 into akkadotnet:dev Feb 28, 2023
@Aaronontheweb Aaronontheweb deleted the fix-ClusterJoinAsync branch February 28, 2023 21:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants