-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Heterogenous silos support #2443
Conversation
… an other to explicitely return the local TypeCodeMap for the silo in TypeManager
…s given a GrainId
…BodyDeserializationFailure
/// <summary> | ||
/// The number of seconds to refresh the cluster grain interface map | ||
/// </summary> | ||
public TimeSpan TypeMapRefreshTimeout { get; set; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment implies that this is a period (amount of time between refreshes), but the property name implies this is the amount of time before an exception will be thrown (max amount of time for a single refresh before failure).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, I renamed to TypeMapRefreshInterval
|
||
private readonly PlacementStrategy defaultPlacementStrategy; | ||
internal IDictionary<SiloAddress, GrainInterfaceMap> GrainInterfaceMapsBySilo |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could instead be IReadOnlyDictionary<,>
by returning the dictionary itself and avoiding the copy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes you are right, completely missed that
public IList<SiloAddress> GetCompatibleSiloList(GrainId grain) | ||
{ | ||
var typeCode = grain.GetTypeCode(); | ||
var compatibleSilos = GrainTypeManager.GetSupportedSilos(typeCode).Intersect(AllActiveSilos).ToList(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method has to be cheap, since it's on the warm/hot path. Currently, it is too allocation-happy for comfort.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is really more a warm path than a hot path (this code is executed for new placements only). I think that yes, we could optimize this method, with some caching mechanism. But it is not that trivial to implement (a bug here and we could miss forever a new silos that joined, or try to place activations on a silos dead for a long time, ect.).
So yes, it can be optimize, but it will add complexity. I am not sure it is worth doing it, especially in this PR.
I can add a comment in this method and open an issue to track this if you want?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, that makes sense. We can consider looking into it afterwards, then
@@ -13,18 +15,26 @@ namespace Orleans.Runtime | |||
internal class GrainTypeManager | |||
{ | |||
private IDictionary<string, GrainTypeData> grainTypes; | |||
private IReadOnlyDictionary<SiloAddress, GrainInterfaceMap> grainInterfaceMapsBySilo; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I meant in the other comment was to use Dictionary<,>
here, since it's cheaper to enumerate:
Method | Mean | StdDev | Gen 0 | Allocated |
---------- |----------- |---------- |------- |---------- |
Concrete | 27.2424 ns | 0.8356 ns | - | 0 B |
Interface | 51.0121 ns | 1.1681 ns | 0.0125 | 56 B |
EDIT: I don't think I ever sent that other comment - whoops
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:)
@@ -70,6 +70,16 @@ public static string GetGrainTypeName(this IPlacementContext @this, int typeCode | |||
return grainClass; | |||
} | |||
|
|||
public static string GetGrainTypeNameAndSupportedSilos(this IPlacementContext @this, GrainId grainId, out IList<SiloAddress> supportedSiloAddresses, string genericArguments = null) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this method used? I couldn't see any usages
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Relics of a previous rebase, removed
…TypeManager to avoid allocation" This reverts commit fc940b6.
…ger to avoid returning a copy
a546929
to
5da8979
Compare
@ReubenBond, please 'Squash and merge' when you all your feedback is taken care of. |
@@ -173,6 +209,14 @@ internal bool ContainsGrainInterface(int interfaceId) | |||
} | |||
} | |||
|
|||
internal bool ContainsGrainImplementation(int typeCode) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No usages of this?
Awesome @benjaminpetit great work! 👍 |
This is a PR for #2379
Currently, the GrainInterfaceMap is built on each silo, and is sent to clients when they connect to a gateway.
Proposed changes
I tried to divide the PR in logical commits so it is easier to review:
How and when the global GrainInterfaceMap is recalculated
To avoid race conditions and complicated error handling, we choose to refresh the GrainInterfaceMap using a timer: every timer tick, if a silo joined the cluster since the last computation, we will get the local GrainInterfaceMap from the new silo and merge it into the global map. If this fails, it will be retried on the next tick. This process is done on each silo in the cluster.
The drawback of this approach is that a silo that joined the cluster will not be chosen for placement by other silos until this timer fire. I don’t think this is an issue, and the timer interval can be tweaked in configuration.
Current implementation limitations