-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Minor improvements to Service Fabric providers #3250
Conversation
@@ -67,14 +67,16 @@ public FabricGatewayProvider(IFabricServiceSiloResolver siloResolver) | |||
/// <inheritdoc /> | |||
public bool SubscribeToGatewayNotificationEvents(IGatewayListListener subscriber) | |||
{ | |||
this.log.Verbose($"Unsubscribing {subscriber} to gateway notification events."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Subsribe *
typo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ahhh thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added these logs because of the weird behavior which eventually led to #3249
this.log.Info($"Update for partition {updated} is superseded by existing version."); | ||
|
||
// Do not update the partition if the exiting one has a newer version than the update. | ||
break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not an expert on service fabric. But would removing this post the risk of older partition overwrites newer partition? Yes the older partition would mostly be corrected by newer partition eventually. But this brought unnecessary handling on partitionChange, which can be avoided, right? or is this pre-mature optimization?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really, I was just finding that these checks really don't add value. In the worst case - that there's some race and updates are out of order, we have stale information for 1 polling cycle (5 seconds).
/// </returns> | ||
public static bool IsOlderThan(this ResolvedServicePartition left, ResolvedServicePartition right) | ||
{ | ||
return left.Info.Id == right.Info.Id && left.CompareVersion(right) < 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this be justleft.CompareVersion(right) < 0;
? so that we removed the unnecessary check and also compared the version, so that FabricServiceSiloResolver.OnPartitionChange won't be processing older partition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can, but the check doesn't provide value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, that wouldn't work - because the version would be reset when the partition ids change (and anyhow the two versions wouldn't be related to each other since they're for different physical partitions)
@dotnet-bot test netstandard-win-functional |
* Minor Service Fabric provider fixes and tweaks * Additional logging in SF gateway provider * FabricMembershipOracle reduce polling interval from 30s to 5s * review feedback
* Minor Service Fabric provider fixes and tweaks * Additional logging in SF gateway provider * FabricMembershipOracle reduce polling interval from 30s to 5s * review feedback
In testing, I found that the method I had eagerly employed to ensure that we were not processing outdated Service Fabric partition change notifications was the cause of many lost updates during disaster recovery scenarios. Not blaming SF, it's likely because of how I was comparing partition equality. i.e, two
ResolvedServicePartition
instances which belong to aSingleton
partition must belong to the same partition because logically there can only be one singleton partition. This PR removes that check, since it's unnecessary (any stale information will quickly be superseded by fresh information).Fixed a
NullReferenceException
which was being thrown when a client attempts to resolve silos before the silo service has been successfully created.Increased logging.
Reduced the eager refresh interval from 30s to 5s. If logging is verbose, then this will cause 5x more logs from that process. The
MaxStaleness
was also reduced to the refresh interval. This has the effect of blacklisted gateways being cleared much more quickly. I see no downside to this - the shortened polling interval is still not aggressive.