-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ability to rebalance allocation placements #1635
Comments
So Nomad does do a best effort spread between clients when they are running the same job, so we wouldn't need to add that. I think the point you are getting at is that you would like Nomad to rebalance occasionally. If I am drawing the right conclusion maybe we retitle the issue? |
That - but also - I would like a guarantee that if I have 3 task groups and 2 or more instances then Nomad doesn't run all 3 containers on the same instance - because if I need to take an instance offline I can't do it without either taking the whole service down with it or increasing the count and hoping that more containers are started up on other instances - but at the same time not having an unscheduable job because I have 2 instances and a count of 3 with |
It is not a feature currently and I think the rebalance + current behavior would solve that. If you could initiate a rebalance then the scheduler would want to spread the task across different hosts naturally (without the distinct_host constraint even set). I am going to rename the issue |
Being able to initiate a rebalance is a good idea. But I'd also like to add that it would seem appropriate for Nomad to automatically attempt to do a rebalance upon failure to make an allocation (due to lack of resources). That is, I would expect Nomad to do a rebalance if doing so would make the allocation succeed. I was actually surprised to learn that this wasn't already implemented for a scheduler like Nomad. |
+1 for a best effort |
We face the same issue with Nomad 0.5.5. Assume this is still open. To put it simply, we don't want to have to manual construct artificial job specifications (such as task group per availability zone) in order to have a job with at least two instances spread across multiple AZs in AWS. Not doing so creates a serious reliability impediment and manually manipulating the scheduler undermines the value of having a scheduler in the first place. Going further, I would argue that tight bin-packing needs to be balanced with reliability requirements. I would rather see relatively frequent, carefully orchestrated service movements, than a single highly loaded server. I think this also plays into how Nomad works nicely with automatic scaling of clusters, such as AWS ASGs. To me this is a fairly big feature but also the next critical thing that determines whether Nomad will be the scheduler of choice or being forced to move to an alternate technology. It feels wholly wrong for the user of a scheduler to solve these inherent challenges of scheduling. Thoughts, reactions? |
Any updates/thoughts to this feature? HA is definitely a higher priority for us than efficiency, would love to be able to distribute on new nomad clients becoming available! |
Even merely the ability to re-evaluate a job away from it's current node would be at least something. |
Looking forward to this as well, I was amazed to see one job have 3 instances on 1 node, while 2 other nodes were added to the cluster, doing nothing. I expected a rebalance to occur towards the new nodes |
We are at version 0.8.3 at this point. Is there a vote with the money feature. I would suggest we could collectively pay Hashicorp to develop this very important feature. |
@alitvak69 we mightn't need to go all the way to crowd funding quite yet, we're I'm building an internal shared hosting platform at the moment, we currently have auto-rebalancing accross hosts as well as being able to loosen the grip on the bin-packing a bit so we can spray across a not-so-elastic internal cloud as being the things we need from hashcorp in the next few months otherwise we're going to need to follow the rest of the market and do the same kubernetes thing as everyone else. We're sure as hell not a fly-by-night tiny basement operation and looking at how much money we're pouring into infrastructure automation for bad solutions that don't work, I suspect there are a few people willing to put a lot of investment into the hashicorp ecosystem if we could just get these seemingly small kinks out of the way. But seriously, I know we got the reschedule stanza recently, so there's obviously some recognition that shuffling jobs between existing nodes is a desired behaviour in some scenarios but it would be nice to be able to encourage nomad to be a bit more lax about keeping a job on the hottest node if it means a more likely startup success and runtime stability, even if it means we end up with some extra capacity wasted away on spare servers. |
Hey all, Just an update on this issue. We understand it is important and there are plans for both short and long term solution. The short is that we will have an allocation life cycle API where individual allocations can be killed and the scheduler will replace them. The longer term solution is a rebalanced system that detects these issues and rebalances the cluster over time or on-demand via an API. |
@dadgar sounds good! is short term Nomad 0.11 ? |
@jippi Aiming to have allocation life cycle APIs in the 0.9.X series. |
@dadgar nice! |
@dadgar any news? |
@dadgar now that we are in the 0.9.x releases is there a more solid target on lifecycle api's? Much appreciated! |
The allocation lifecycle APIs made it into 0.9.2, documented here: |
I can stop allocation with Nomad UI in 0.10.4 and it will start on another node. Is there a plan to have an automatic rebalance now? |
+1 for rebalance. Stopping an allocation really doesn't help our scenarios. In our case we have changing node metadata that can cause allocations to move around - and given that jobs control their constraints its not desirable to go and figure out all of that. What we would expect is that the scheduler periodically looks for new nodes to place/rebalance allocations on, and ALSO looks for allocations that should be removed because they no longer meet the constraint. In our experiments, if we change a constraint attribute the allocation will never leave the node until a new job update comes (even running --force-reschedule does not cause the allocation(s) to be reevaluated) |
Any news on this already? |
@robbah we'll update issues with news when we have it. Please feel free to add 👍 reactions to the top-level comment (which we do look at), but please don't spam issues with bumps. |
I'm not sure how simple this would be to implement but it would be great we could make
distinct_hosts
best effort - this would be useful when backend instances (clients in Nomad) need to be taken offline for maintenance or upgrades.For example, if we have a task group with a count of 3,
distinct_hosts = best-effort
and 3 Nomad clients, the task group would be distributed across the three instances as one container per instance.If we then take offline one of the three backends for maintenance (or if it failed due to a kernel panic or networking issue) the scheduler would provision the container from that backend on one of the remaining backends. The scheduler would then detect that either the backend recovered or a new backend joined the Nomad cluster and rebalance the containers to restore the distinct hosts invariant.
The text was updated successfully, but these errors were encountered: