-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Proposal] Grain migration #7692
Comments
When using Would there be any disruption to grain extensions as a result of a migration? I'm think about transactions in particular, but applies to other scenarios as well. |
We've moved this issue to the Backlog. This means that it is not going to be worked on for the coming release. We review items in the backlog at the end of each milestone/release and depending on the team's priority we may reconsider this issue for the following milestone. |
I think this grain migration during rebalancing functionality is essential. Both Akka and Dapr have this functionality as I understand it. I'm new to Orleans, so apologies is any of what I say here is incorrect. Do you intend grains to be migrated/rebalanced even while busy? I think that it is essential that be the case - i.e., I think grains should be migrated during rebalancing, even if they are busy in the middle of one (or more) operations. However, this would mean that grains would be proactively deactivated in the middle of operations, which is not the case today. i.e., grains today are only deactivated after being idle for some time. Therefore, perhaps migration mid-operation needs to be opt-in via configuration. I think Dapr enables all actors (of any type) to be rebalanced via the I also think there should also be a configurable timeout - i.e., how long each grain has after Also, what is the priority/milestone for implementing this feature? I noticed that #7470 was closed in favour of this proposal and that #7470 was on the roadmap for .NET 7.0 as stated by #7467; however, this proposal is currently on the Backlog milestone rather than the ".NET 7.0" or "Planning-4.0" milestone. |
Neither Dapr nor Akka have this functionality. Actors in Akka are not virtual actors: their location is decided up-front and does not change while the actor is active. Dapr uses hash-based routing and does not have a directory, so actor placement cannot be customized, and it cannot migrate an actor from one place to another while preserving its ephemeral state (incl pending requests). Here are the docs on
This is needed because of Dapr's lack of a directory. It means that whenever a host joins or leaves the cluster (eg, a scaling event or an upgrade), the actors necessarily shift around (because the hash ring has changed). The purpose of this proposal is for the ability to migrate grains without first deactivating them, allowing them to preserve state and move atomically from one host to another. If a grain is processing some request, it would at least finish that request before migrating the other requests elsewhere. This is not high priority. There are some benefits to having this functionality, but it's certainly not critical (it's not as if it exists elsewhere). Is there something you're looking for in this feature? |
Thanks @ReubenBond. I'm looking for some grains to be migrated to a new silo when it joins the cluster. e.g., if there are two silos and a third joins the cluster due to scaling, 1/3 of the grains from each silo move to the new silo, thus balancing the load. I'm not concerned about migrating grains without first deactivating them - I'm happy for them to be deactivated, migrated and then reactivated. I'm only concerned that grains are not relocated to new silos when added to the cluster during scaling. That's specifically the functionality I was referring to - migrating existing actors to new nodes added to a cluster during scaling. As you say, the absence of a directory forces them into this behaviour, but that behaviour is desirable for me. |
@oising's activation shedding package ought to help with that: https://github.com/oising/OrleansContrib.ActivationShedding
|
Thanks @ReubenBond! I wasn't aware of that library. You are correct in that we have a fairly fixed set of grains, which are active at all times. The load across these grains is highly dynamic in that most of the time, most active grains are being drip fed work to do. However, under some circumstances (unpredictably so), hundreds of those grains can each start fully consuming 4-8 vCPUs for up to a few minutes. Kubernetes scales up the cluster in response to this increased load, at which time we need many grains to migrate to those new nodes to complete the work. Many of the grains needing to be migrated will be in the middle of executing one or more operations when the scaling event occurs (they are all re-entrant). I'll check out the Orleans.ActivationShedding library. Thanks again! |
@ReubenBond, upon inspection of OrleansContrib.ActivationShedding, it appears that the mechanism this library uses to migrate grains to a different silo is to deactivate them (specifically, by invoking However, it is stated above in this issue that "voluntarily deactivating the grain will not likely cause that grain to be reactivated on a different silo because caches will point to the original silo, which will happily reactivate the grain upon receiving a message." Will Orleans therefore actually reactivate a grain (using |
@ReubenBond out of all of the options for naming I prefer the terminology you've used dehydrating vs rehydrating. If my memory serves me correct Biztak also used this term for describing a similar operation for orchestrations. |
Orleans 4.0 is different in that regard, but the approach works for Orleans 3.x. |
Is there or will there be an approach that works in Orleans 4.0? |
Yes, the approach which we plan for 4.0 is an atomic directory update. |
Is this proposal described somewhere I could read and understand it? Will the approach of proactively deactivating a grain with |
@ReubenBond, if a grain is migrated between silos due to rebalancing, then what would you expect would happen to its in-progress calls? This will be especially relevant if those calls are long-running (as proposed by #7649) and if the long-running call chains are long. If in-progress calls are cancelled when a grain is migrated, then that would cancel the entire call chain, which means a large amount of in-flight work could be cancelled because an upstream grain is migrated during rebalancing. This would make rebalancing much more disruptive (i.e., impacting many more grains than just those being migrated) in the case where there are long-running method calls and long call chains. |
Calls (long-running or not) will need to complete before migration. As shown above, migration follows the regular deactivation path, so this is not related to migration in particular. If calls fail to complete before some timeout elapses, then they'll be rejected. We cannot terminate work (cancellation is cooperative), so if a grain has some runaway work, then that will continue that way. We can throw exceptions when the runaway grain tries to access the runtime, though. This proposal is only about the mechanism for grain migration, not the policies. If you don't want a grain to migrate, then there will be some way to specify a policy stating that it must never be migrated. |
Thanks @ReubenBond. I figured that was the case. I was thinking that if a grain eligible for migration during rebalancing were to be informed that Orleans is seeking to migrate the grain (perhaps via a OnMigrating method call), then the grain could gracefully cancel its long-running operations, so that the grain could then be migrated. However, doing so would also mean cancelling any in-progress outbound calls, which would mean cancelling a lot of downstream in-progress work if the grain being migrated was upstream in a long call chain. I'm just making the point that having long-running calls in long call chains runs counter to dynamic/responsive cluster rebalancing. I was considering a design that had deep call chains containing long-running calls, but I require load to be dynamically/quickly rebalanced across the cluster when it is scaled up - which means I need to consider a different design - one where the heavy lifting is decomposed into smaller, shorter-running calls. |
How long are these long-running calls that you speak of and how frequently do you expect the cluster to be rebalancing? |
The most upstream grain operations in the call chain would run for up to 5 minutes. They would spend most of their time waiting for downstream work to complete. The calls would be structured in a tree, where each node in the tree might have one minute of CPU-time work to do, but each node would also need to wait for all its child nodes to complete before completing. The load is highly dynamic in that most (99.999%) of events received from the outside world are handled with small trees executed over the cluster with not much work to do. However, 0.001% of events must be handled with a large tree where each node has a lot of work to do - which means that when one of these events is received, the cluster needs to scale up substantially. I expect that the cluster will need to scale up to deal with these kinds of external events about 10-20 times per month per customer. Furthermore, the tree is "spun up / scheduled" first when the external event is received before the work is then executed to ensure that all interdependencies between tree nodes is known before execution commences. Therefore, all grains are active prior to the heavy lifting beginning. Thus when the heavy lifting begins and more nodes are added to the cluster, the grains are all busy doing work and/or waiting for child grains to complete their work. If the upstream grains (in the first 5-10 layers of the tree) were to be idled and migrated, then all the downstream work at the lower levels of the tree would also have to be cancelled and restarted. I have some ideas for dealing with this, which involves a different decomposition of work across grains and grain operations. I'm just pointing out that my original design won't dynamically scale due to long-running calls with long call chains - just in case that might influence the design proposed by this issue. |
I was thinking about this some more over the weekend. What happens if What if the grain is re-entrant and the grain has multiple concurrent calls at the time If the latter is true, then a very busy re-entrant grain that is always executing one or more calls will never actually be deactivated after Ideally, Orleans will queue all new method calls to a grain after The next question is whether Orleans will invoke the proposed I think it would be very useful if Orleans were to invoke When |
Lots of good questions. When
I'm happy to say that the ideal case is what's implemented today.
The regular deactivation procedure will be followed (i.e, it will not call |
That's really great to hear! I'm very impressed.
Is there or will there be a mechanism for grains to be notified that Orleans wishes to migrate them to another silo due to cluster rebalancing? i.e., is there or will there be a mechanism for a grain to be informed it is being deactivated (i.e., its deactivation is pending)? This would allow a grain involved in long-running method calls to gracefully stop/pause that work so that those in-progress method calls complete more quickly, thus allowing the grain to be deactivated and migrated more quickly. |
Hi @ReubenBond - this issue is still in the backlog, although is a significant issue for elastic systems. Is there any view on how this would be progressed? Side question - the ActOp paper holds some significant performance enhancements, as this was implemented directly on Orleans, is there any path for this to become an option in the framework? thanks! |
I am hoping to include this in .NET 8 and to start making progress on it soon.
That would be the next step, but I think it's an interesting one. Once this item is in, we'll have the basic mechanism required and we can consider clever algorithms for auto-optimization like the graph cutting one presented in ActOp |
* Grain migration * Review feedback * Add helper methods to migration state, migrate grain state by default, add IGrainDirectory fallback tentatively * Fixup last commit * Perform Migration lookup asynchronously and avoid starvation * Minor cleanup, mostly drive-by * Implement support for placement hints in placement directors where it makes some sense * Add migration tests for Grain<T> and IPersistentState<T> * Propagate captured RequestContext to dehydration and add fault tests * Add GrainDirectory tests for atomic update * RedisGrainDirectory: use a hash instead of serialized blob, update style * Use JSON instead of hash again - retaining compatibility * Additional tests and fix LRU to support updating * Fix doc comment for IGrainMigrationParticipant * Minor doc comment fix * Support live migration for PubSubRendezvousGrain, MemoryStreamQueueGrain, and ReminderTableGrain * Fixes * Fix test * Fix test
This is a proposal to implement live workload migration in Orleans. Live grain migration gives grains the capability to move from one host to another without dropping requests or losing in-memory state. The proposal is for a live migration mechanism. As for the policy, which determines when, where to, and which grains to migrate, that is a separate concern. There may be multiple policies which perform migration:
Motivation
Capacity utilization improvements for rolling upgrades & scale-out
When a silo is added to a cluster, active grains are not moved to the new silo. When Orleans was originally released, it was common to use blue/green deployments (on Azure Cloud Services Worker Roles), so the cluster was much less dynamic than it is today where
rolling deployments are fairly commonplace. Depending on the workload (number of grains, calling patterns), rolling deployments can result in grains distribution becoming heavily skewed towards the silos which were upgraded first. Therefore, some hosts may become heavily loaded while others are under-utilized and adding additional resources will not promptly reduce utilization of the existing resources. There are two problems here:
If we have a mechanism for live migration of grains, we could move a portion of the load on heavily loaded servers to under-utilized servers.
Why deactivation is not enough
Deactivating some portion of grains on the heavily loaded servers is not enough to implement load leveling. We will explore why that is in this section.
Grains are responsible for registering themselves in the grain directory as a part of the activation process. When a grain deactivates, it is also responsible for removing its own registration.
Thus, if a grain calls
DeactivateOnIdle()
today, there will be a window of time where the grain is not registered anywhere, and any silo is free to try to place it somewhere. In most cases, silos will choose different targets and there will be a race for new activations on each silo to register themselves in the grain directory. Only one activation can win this race and the others will need to forward pending messages to the winning activation and update their caches.However, silos cache directory lookups and those caches aren't immediately invalidated when a grain is deactivated. We no longer require that messages carry an
ActivationId
which must match an existing activation or set theNewActivation
flag on the message (which has also been removed). Hence, messages sent to the original silo will reactivate the grain without causing a message rejection and cache invalidation. This means that once a frequently messaged grain is activated on a silo, voluntarily deactivating the grain will not likely cause that grain to be reactivated on a different silo because caches will point to the original silo, which will happily reactivate the grain upon receiving a message. The rationale for the changes which lead to this behavior are (I believe) sound: it reduces the flurry of rerouting and multiple directory lookups caused by activating a new grain and it simplifies the process, however it has this arguable downside of making grain placement stickier than desirable in all circumstances.Soft-state preservation during upgrades
Some customers have told us that they would like to preserve running grain state during migrations.
This state tends to be soft state, state which can be reconstructed (perhaps loaded from a database) at a cost.
If a live grain migration mechanism includes a facility to preserve soft state across a migration, then that could be utilized by these customers to preserve soft state.
Additionally, it can be utilized by grain state providers to avoid needing to reload state from the database during the migration, lessening database load.
Self-optimizing placement
Previous research (referred to as ActOp) into optimizing actor-based systems (in particular, Orleans) indicates that substantial benefits can be obtained by co-locating grains which frequently communicate with each other. The ActOp research involved implementing an efficient distributed graph-partitioning algorithm to incrementally optimize inter-host RPC calls by collocating the groups of grains which communicate with each other the most without violating load-levelling constraints. In the Halo Presence scenario which the paper targeted, experiments showed a decrease from around 90% of calls being remote (using random placement) to around 12% of calls being remote. As a result, 99th percentile latency decreased by more than 3x and throughput increased by 2x. The improvements would vary substantially based upon cluster size and the particulars of the workload, but they demonstrate the potential gains.
Summary of the functionality
SiloUnavailableException
on the call chain.This issue exists today with activation races and with clients calling gateways, but currently they are left to timeout.
OnDeactivateAsync
)Design
Lifecycle of the grain will be modified with two new hooks, for initiating and completing migration. Any process can crash at any point. That fact is omitted below for simplicity.
The grain class and components injected into the grain can enlist in a per-activation migration lifecycle hook:
When
OnDeactivateAsync
is called, the reason will be set to 'Dehydrating' so that app code can decide whether to persist state or not. Any developer performing a state write inOnDeactivateAsync
either doesn't require that the state is written or doesn't realize that a process can crash at any moment and therefore they cannot rely onOnDeactivateAsync
being called to preserve state.Discussion: What terminology should be used here? This migration functionality could be used to put a grain into storage (eg, cached on disk or in a database), so if it was repurposed for that then maybe a more generic terminology should be used. Eg, dehydrating vs rehydrating, hibernating vs waking (estivate is the reverse of hibernate, but that's a very esoteric word), pause vs resume, or freeze vs thaw. Python uses the terms pickle & unpickle, but in the context of serialization which this involves but which isn't the purpose. Personally, I favor dehydrate vs rehydrate since we are using the analogue of grains. A little research shows that preserving grains for long periods largely involves removing oxygen, reducing moisture, and reducing temperature.
Enhancements / future work
See: [Idea] Salvage Activations from Defunct Directory Partitions #2656
Pausing the processing of background tasks is not straightforward: they can capture the activation's
TaskScheduler
and schedule any work they like on it.Perhaps we deem this acceptable, but a better solution would be to have a more resilient in-memory directory.
Related: #4694
The text was updated successfully, but these errors were encountered: