-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Akka.Cluster.Tools.Singleton / Akka.Cluster.Sharding: fix duplicate shards caused by incorrect ClusterSingletonManager
HandOver
#7297
Akka.Cluster.Tools.Singleton / Akka.Cluster.Sharding: fix duplicate shards caused by incorrect ClusterSingletonManager
HandOver
#7297
Conversation
Eliminates the source of akkadotnet#6793, which was caused by using the incorrect ordering methodology when it came to determining which `ClusterSingletonManager` to hand-over to during member state transitions. close akkadotnet#6973 close akkadotnet#7196
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This resolves one of the nastiest bugs we've ever debugged - I knew it would be a stupid problem like this, but tracking it down was a herculean effort. Hundreds of hours spent between @Arkatufus and myself.
_memberAgeComparer = settings.ConsiderAppVersion | ||
? MemberAgeOrdering.DescendingWithAppVersion | ||
: MemberAgeOrdering.Descending; | ||
_memberAgeComparer = Member.AgeOrdering; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the bug fix - descending order of members by age means we always get THE YOUNGEST at the front of the membersByAge
list. Members MUST BE IN ASCENDING ORDER, OLDEST FIRST. I'm not sure how far back this bug goes - at least to v1.5. I'll add another comment after I've dug that up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No longer needed, sticking with DRY especially since Member.AgeOrdering
is already thoroughly sanity checked via #7291
_memberAgeComparer = considerAppVersion | ||
? MemberAgeOrdering.DescendingWithAppVersion | ||
: MemberAgeOrdering.Descending; | ||
_memberAgeComparer = Member.AgeOrdering; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uses the same fix as the SingletonProxy
else | ||
{ | ||
Unhandled(message); | ||
case ClusterEvent.CurrentClusterState state: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ReSharper'd this to use a switch
statement. No biggie.
@@ -71,11 +71,6 @@ public TestException(string message, Exception innerEx) | |||
: base(message, innerEx) | |||
{ | |||
} | |||
|
|||
protected TestException(SerializationInfo info, StreamingContext context) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Obsolete, deleted it to remove a build warning.
Going to put this into our test lab with a private build and verify the fix overnight. |
Looked at the history - apparently this has ALWAYS been a bug with Akka.Cluster.Tools, going all the way back to when the Cluster Singeton was first introduced #1530 |
We'll see if tests and the MNTR need to be adjusted, but our tracing system found definitive proof that the hand off was being screwed up, which lead to two |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Changes
Eliminates the source of #6793, which was caused by using the incorrect ordering methodology when it came to determining which
ClusterSingletonManager
to hand-over to during member state transitions.close #6973
close #7196
Supersedes #7287 and #7197
Checklist
For significant changes, please ensure that the following have been completed (delete if not relevant):
AppVersion
joins cluster #7197