Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ability to rebalance allocation placements #1635

Open
ghost opened this issue Aug 22, 2016 · 22 comments
Open

Ability to rebalance allocation placements #1635

ghost opened this issue Aug 22, 2016 · 22 comments

Comments

@ghost
Copy link

ghost commented Aug 22, 2016

I'm not sure how simple this would be to implement but it would be great we could make distinct_hosts best effort - this would be useful when backend instances (clients in Nomad) need to be taken offline for maintenance or upgrades.

For example, if we have a task group with a count of 3, distinct_hosts = best-effort and 3 Nomad clients, the task group would be distributed across the three instances as one container per instance.

If we then take offline one of the three backends for maintenance (or if it failed due to a kernel panic or networking issue) the scheduler would provision the container from that backend on one of the remaining backends. The scheduler would then detect that either the backend recovered or a new backend joined the Nomad cluster and rebalance the containers to restore the distinct hosts invariant.

@ghost ghost changed the title [feature] Best effort for distinct hosts [feature request] Best effort for distinct hosts Aug 22, 2016
@dadgar
Copy link
Contributor

dadgar commented Aug 22, 2016

So Nomad does do a best effort spread between clients when they are running the same job, so we wouldn't need to add that. I think the point you are getting at is that you would like Nomad to rebalance occasionally.

If I am drawing the right conclusion maybe we retitle the issue?

@ghost
Copy link
Author

ghost commented Aug 22, 2016

I think the point you are getting at is that you would like Nomad to rebalance

That - but also - I would like a guarantee that if I have 3 task groups and 2 or more instances then Nomad doesn't run all 3 containers on the same instance - because if I need to take an instance offline I can't do it without either taking the whole service down with it or increasing the count and hoping that more containers are started up on other instances - but at the same time not having an unscheduable job because I have 2 instances and a count of 3 with distinct_hosts - if that makes sense? Is that a feature in the scheduler at present?

@dadgar
Copy link
Contributor

dadgar commented Aug 23, 2016

It is not a feature currently and I think the rebalance + current behavior would solve that. If you could initiate a rebalance then the scheduler would want to spread the task across different hosts naturally (without the distinct_host constraint even set).

I am going to rename the issue

@dadgar dadgar changed the title [feature request] Best effort for distinct hosts Ability to rebalance allocation placements Aug 23, 2016
@jemc
Copy link

jemc commented Mar 26, 2017

Being able to initiate a rebalance is a good idea.

But I'd also like to add that it would seem appropriate for Nomad to automatically attempt to do a rebalance upon failure to make an allocation (due to lack of resources). That is, I would expect Nomad to do a rebalance if doing so would make the allocation succeed. I was actually surprised to learn that this wasn't already implemented for a scheduler like Nomad.

@discobean
Copy link

+1 for a best effort distinct_hosts feature w/ auto or manual rebalance option would certainly be helpful for us

@MDL-Cloud-Ops
Copy link

That - but also - I would like a guarantee that if I have 3 task groups and 2 or more instances then Nomad doesn't run all 3 containers on the same instance - because if I need to take an instance offline I can't do it without either taking the whole service down with it or increasing the count and hoping that more containers are started up on other instances - but at the same time not having an unscheduable job because I have 2 instances and a count of 3 with distinct_hosts - if that makes sense? Is that a feature in the scheduler at present?

We face the same issue with Nomad 0.5.5. Assume this is still open.

To put it simply, we don't want to have to manual construct artificial job specifications (such as task group per availability zone) in order to have a job with at least two instances spread across multiple AZs in AWS. Not doing so creates a serious reliability impediment and manually manipulating the scheduler undermines the value of having a scheduler in the first place.

Going further, I would argue that tight bin-packing needs to be balanced with reliability requirements. I would rather see relatively frequent, carefully orchestrated service movements, than a single highly loaded server. I think this also plays into how Nomad works nicely with automatic scaling of clusters, such as AWS ASGs.

To me this is a fairly big feature but also the next critical thing that determines whether Nomad will be the scheduler of choice or being forced to move to an alternate technology. It feels wholly wrong for the user of a scheduler to solve these inherent challenges of scheduling.

Thoughts, reactions?

@djenriquez
Copy link

Any updates/thoughts to this feature? HA is definitely a higher priority for us than efficiency, would love to be able to distribute on new nomad clients becoming available!

@daledude
Copy link

Even merely the ability to re-evaluate a job away from it's current node would be at least something.

@CumpsD
Copy link

CumpsD commented Apr 13, 2018

Looking forward to this as well, I was amazed to see one job have 3 instances on 1 node, while 2 other nodes were added to the cluster, doing nothing. I expected a rebalance to occur towards the new nodes

@alitvak69
Copy link

We are at version 0.8.3 at this point. Is there a vote with the money feature. I would suggest we could collectively pay Hashicorp to develop this very important feature.

@hvindin
Copy link

hvindin commented May 29, 2018

@alitvak69 we mightn't need to go all the way to crowd funding quite yet, we're I'm building an internal shared hosting platform at the moment, we currently have auto-rebalancing accross hosts as well as being able to loosen the grip on the bin-packing a bit so we can spray across a not-so-elastic internal cloud as being the things we need from hashcorp in the next few months otherwise we're going to need to follow the rest of the market and do the same kubernetes thing as everyone else.

We're sure as hell not a fly-by-night tiny basement operation and looking at how much money we're pouring into infrastructure automation for bad solutions that don't work, I suspect there are a few people willing to put a lot of investment into the hashicorp ecosystem if we could just get these seemingly small kinks out of the way.

But seriously, I know we got the reschedule stanza recently, so there's obviously some recognition that shuffling jobs between existing nodes is a desired behaviour in some scenarios but it would be nice to be able to encourage nomad to be a bit more lax about keeping a job on the hottest node if it means a more likely startup success and runtime stability, even if it means we end up with some extra capacity wasted away on spare servers.

@dadgar
Copy link
Contributor

dadgar commented Jun 4, 2018

Hey all,

Just an update on this issue. We understand it is important and there are plans for both short and long term solution. The short is that we will have an allocation life cycle API where individual allocations can be killed and the scheduler will replace them. The longer term solution is a rebalanced system that detects these issues and rebalances the cluster over time or on-demand via an API.

@jippi
Copy link
Contributor

jippi commented Jun 5, 2018

@dadgar sounds good! is short term Nomad 0.11 ?

@dadgar
Copy link
Contributor

dadgar commented Jun 5, 2018

@jippi Aiming to have allocation life cycle APIs in the 0.9.X series.

@jippi
Copy link
Contributor

jippi commented Jun 5, 2018

@dadgar nice!

@suslovsergey
Copy link

@dadgar any news?

@KamilKeski
Copy link

@dadgar now that we are in the 0.9.x releases is there a more solid target on lifecycle api's? Much appreciated!

@langmartin
Copy link
Contributor

The allocation lifecycle APIs made it into 0.9.2, documented here:
https://www.nomadproject.io/api/allocations.html#stop-allocation

@pashinin
Copy link

I can stop allocation with Nomad UI in 0.10.4 and it will start on another node.

Is there a plan to have an automatic rebalance now?

@idrennanvmware
Copy link
Contributor

+1 for rebalance.

Stopping an allocation really doesn't help our scenarios. In our case we have changing node metadata that can cause allocations to move around - and given that jobs control their constraints its not desirable to go and figure out all of that. What we would expect is that the scheduler periodically looks for new nodes to place/rebalance allocations on, and ALSO looks for allocations that should be removed because they no longer meet the constraint.

In our experiments, if we change a constraint attribute the allocation will never leave the node until a new job update comes (even running --force-reschedule does not cause the allocation(s) to be reevaluated)

@robbah
Copy link

robbah commented Jun 22, 2022

Any news on this already?

@tgross
Copy link
Member

tgross commented Jun 22, 2022

@robbah we'll update issues with news when we have it. Please feel free to add 👍 reactions to the top-level comment (which we do look at), but please don't spam issues with bumps.

@hashicorp hashicorp locked as spam and limited conversation to collaborators May 20, 2024
@jrasell jrasell closed this as not planned Won't fix, can't repro, duplicate, stale May 20, 2024
@jrasell jrasell reopened this May 20, 2024
@hashicorp hashicorp unlocked this conversation May 20, 2024
@hashicorp hashicorp locked as spam and limited conversation to collaborators May 20, 2024
@jrasell jrasell closed this as not planned Won't fix, can't repro, duplicate, stale May 20, 2024
@jrasell jrasell reopened this May 20, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests