Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minor improvements to Service Fabric providers #3250

Merged
merged 4 commits into from
Aug 1, 2017

Conversation

ReubenBond
Copy link
Member

@ReubenBond ReubenBond commented Jul 26, 2017

In testing, I found that the method I had eagerly employed to ensure that we were not processing outdated Service Fabric partition change notifications was the cause of many lost updates during disaster recovery scenarios. Not blaming SF, it's likely because of how I was comparing partition equality. i.e, two ResolvedServicePartition instances which belong to a Singleton partition must belong to the same partition because logically there can only be one singleton partition. This PR removes that check, since it's unnecessary (any stale information will quickly be superseded by fresh information).

Fixed a NullReferenceException which was being thrown when a client attempts to resolve silos before the silo service has been successfully created.

Increased logging.

Reduced the eager refresh interval from 30s to 5s. If logging is verbose, then this will cause 5x more logs from that process. The MaxStaleness was also reduced to the refresh interval. This has the effect of blacklisted gateways being cleared much more quickly. I see no downside to this - the shortened polling interval is still not aggressive.

@@ -67,14 +67,16 @@ public FabricGatewayProvider(IFabricServiceSiloResolver siloResolver)
/// <inheritdoc />
public bool SubscribeToGatewayNotificationEvents(IGatewayListListener subscriber)
{
this.log.Verbose($"Unsubscribing {subscriber} to gateway notification events.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Subsribe *
typo

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahhh thanks

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added these logs because of the weird behavior which eventually led to #3249

this.log.Info($"Update for partition {updated} is superseded by existing version.");

// Do not update the partition if the exiting one has a newer version than the update.
break;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not an expert on service fabric. But would removing this post the risk of older partition overwrites newer partition? Yes the older partition would mostly be corrected by newer partition eventually. But this brought unnecessary handling on partitionChange, which can be avoided, right? or is this pre-mature optimization?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really, I was just finding that these checks really don't add value. In the worst case - that there's some race and updates are out of order, we have stale information for 1 polling cycle (5 seconds).

/// </returns>
public static bool IsOlderThan(this ResolvedServicePartition left, ResolvedServicePartition right)
{
return left.Info.Id == right.Info.Id && left.CompareVersion(right) < 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this be justleft.CompareVersion(right) < 0;? so that we removed the unnecessary check and also compared the version, so that FabricServiceSiloResolver.OnPartitionChange won't be processing older partition.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can, but the check doesn't provide value

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, that wouldn't work - because the version would be reset when the partition ids change (and anyhow the two versions wouldn't be related to each other since they're for different physical partitions)

@xiazen
Copy link
Contributor

xiazen commented Jul 31, 2017

@dotnet-bot test netstandard-win-functional

@xiazen xiazen merged commit a568af6 into dotnet:master Aug 1, 2017
sergeybykov pushed a commit to sergeybykov/orleans that referenced this pull request Aug 1, 2017
* Minor Service Fabric provider fixes and tweaks

* Additional logging in SF gateway provider

* FabricMembershipOracle reduce polling interval from 30s to 5s

* review feedback
ReubenBond added a commit that referenced this pull request Aug 7, 2017
* Minor Service Fabric provider fixes and tweaks

* Additional logging in SF gateway provider

* FabricMembershipOracle reduce polling interval from 30s to 5s

* review feedback
@github-actions github-actions bot locked and limited conversation to collaborators Dec 9, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants