Ability to rebalance allocation placements #1635

ghost · 2016-08-22T11:46:42Z

I'm not sure how simple this would be to implement but it would be great we could make distinct_hosts best effort - this would be useful when backend instances (clients in Nomad) need to be taken offline for maintenance or upgrades.

For example, if we have a task group with a count of 3, distinct_hosts = best-effort and 3 Nomad clients, the task group would be distributed across the three instances as one container per instance.

If we then take offline one of the three backends for maintenance (or if it failed due to a kernel panic or networking issue) the scheduler would provision the container from that backend on one of the remaining backends. The scheduler would then detect that either the backend recovered or a new backend joined the Nomad cluster and rebalance the containers to restore the distinct hosts invariant.

The text was updated successfully, but these errors were encountered:

dadgar · 2016-08-22T18:18:30Z

So Nomad does do a best effort spread between clients when they are running the same job, so we wouldn't need to add that. I think the point you are getting at is that you would like Nomad to rebalance occasionally.

If I am drawing the right conclusion maybe we retitle the issue?

ghost · 2016-08-22T21:04:22Z

I think the point you are getting at is that you would like Nomad to rebalance

That - but also - I would like a guarantee that if I have 3 task groups and 2 or more instances then Nomad doesn't run all 3 containers on the same instance - because if I need to take an instance offline I can't do it without either taking the whole service down with it or increasing the count and hoping that more containers are started up on other instances - but at the same time not having an unscheduable job because I have 2 instances and a count of 3 with distinct_hosts - if that makes sense? Is that a feature in the scheduler at present?

dadgar · 2016-08-23T17:12:51Z

It is not a feature currently and I think the rebalance + current behavior would solve that. If you could initiate a rebalance then the scheduler would want to spread the task across different hosts naturally (without the distinct_host constraint even set).

I am going to rename the issue

jemc · 2017-03-26T00:49:48Z

Being able to initiate a rebalance is a good idea.

But I'd also like to add that it would seem appropriate for Nomad to automatically attempt to do a rebalance upon failure to make an allocation (due to lack of resources). That is, I would expect Nomad to do a rebalance if doing so would make the allocation succeed. I was actually surprised to learn that this wasn't already implemented for a scheduler like Nomad.

discobean · 2017-07-26T06:09:06Z

+1 for a best effort distinct_hosts feature w/ auto or manual rebalance option would certainly be helpful for us

MDL-Cloud-Ops · 2017-09-05T07:38:15Z

That - but also - I would like a guarantee that if I have 3 task groups and 2 or more instances then Nomad doesn't run all 3 containers on the same instance - because if I need to take an instance offline I can't do it without either taking the whole service down with it or increasing the count and hoping that more containers are started up on other instances - but at the same time not having an unscheduable job because I have 2 instances and a count of 3 with distinct_hosts - if that makes sense? Is that a feature in the scheduler at present?

We face the same issue with Nomad 0.5.5. Assume this is still open.

To put it simply, we don't want to have to manual construct artificial job specifications (such as task group per availability zone) in order to have a job with at least two instances spread across multiple AZs in AWS. Not doing so creates a serious reliability impediment and manually manipulating the scheduler undermines the value of having a scheduler in the first place.

Going further, I would argue that tight bin-packing needs to be balanced with reliability requirements. I would rather see relatively frequent, carefully orchestrated service movements, than a single highly loaded server. I think this also plays into how Nomad works nicely with automatic scaling of clusters, such as AWS ASGs.

To me this is a fairly big feature but also the next critical thing that determines whether Nomad will be the scheduler of choice or being forced to move to an alternate technology. It feels wholly wrong for the user of a scheduler to solve these inherent challenges of scheduling.

Thoughts, reactions?

djenriquez · 2017-10-31T07:49:18Z

Any updates/thoughts to this feature? HA is definitely a higher priority for us than efficiency, would love to be able to distribute on new nomad clients becoming available!

daledude · 2018-01-17T20:43:55Z

Even merely the ability to re-evaluate a job away from it's current node would be at least something.

CumpsD · 2018-04-13T14:54:13Z

Looking forward to this as well, I was amazed to see one job have 3 instances on 1 node, while 2 other nodes were added to the cluster, doing nothing. I expected a rebalance to occur towards the new nodes

alitvak69 · 2018-04-30T22:00:53Z

We are at version 0.8.3 at this point. Is there a vote with the money feature. I would suggest we could collectively pay Hashicorp to develop this very important feature.

hvindin · 2018-05-29T18:52:59Z

@alitvak69 we mightn't need to go all the way to crowd funding quite yet, we're I'm building an internal shared hosting platform at the moment, we currently have auto-rebalancing accross hosts as well as being able to loosen the grip on the bin-packing a bit so we can spray across a not-so-elastic internal cloud as being the things we need from hashcorp in the next few months otherwise we're going to need to follow the rest of the market and do the same kubernetes thing as everyone else.

We're sure as hell not a fly-by-night tiny basement operation and looking at how much money we're pouring into infrastructure automation for bad solutions that don't work, I suspect there are a few people willing to put a lot of investment into the hashicorp ecosystem if we could just get these seemingly small kinks out of the way.

But seriously, I know we got the reschedule stanza recently, so there's obviously some recognition that shuffling jobs between existing nodes is a desired behaviour in some scenarios but it would be nice to be able to encourage nomad to be a bit more lax about keeping a job on the hottest node if it means a more likely startup success and runtime stability, even if it means we end up with some extra capacity wasted away on spare servers.

dadgar · 2018-06-04T21:39:57Z

Hey all,

Just an update on this issue. We understand it is important and there are plans for both short and long term solution. The short is that we will have an allocation life cycle API where individual allocations can be killed and the scheduler will replace them. The longer term solution is a rebalanced system that detects these issues and rebalances the cluster over time or on-demand via an API.

jippi · 2018-06-05T07:25:25Z

@dadgar sounds good! is short term Nomad 0.11 ?

dadgar · 2018-06-05T16:57:56Z

@jippi Aiming to have allocation life cycle APIs in the 0.9.X series.

jippi · 2018-06-05T17:29:29Z

@dadgar nice!

suslovsergey · 2019-07-12T06:09:10Z

@dadgar any news?

KamilKeski · 2019-07-15T16:20:00Z

@dadgar now that we are in the 0.9.x releases is there a more solid target on lifecycle api's? Much appreciated!

langmartin · 2019-07-16T14:06:16Z

The allocation lifecycle APIs made it into 0.9.2, documented here:
https://www.nomadproject.io/api/allocations.html#stop-allocation

pashinin · 2020-03-12T16:27:37Z

I can stop allocation with Nomad UI in 0.10.4 and it will start on another node.

Is there a plan to have an automatic rebalance now?

idrennanvmware · 2020-07-20T15:16:16Z

+1 for rebalance.

Stopping an allocation really doesn't help our scenarios. In our case we have changing node metadata that can cause allocations to move around - and given that jobs control their constraints its not desirable to go and figure out all of that. What we would expect is that the scheduler periodically looks for new nodes to place/rebalance allocations on, and ALSO looks for allocations that should be removed because they no longer meet the constraint.

In our experiments, if we change a constraint attribute the allocation will never leave the node until a new job update comes (even running --force-reschedule does not cause the allocation(s) to be reevaluated)

robbah · 2022-06-22T12:29:04Z

Any news on this already?

tgross · 2022-06-22T13:31:16Z

@robbah we'll update issues with news when we have it. Please feel free to add 👍 reactions to the top-level comment (which we do look at), but please don't spam issues with bumps.

…mad#698, hashicorp/nomad#10727, hashicorp/nomad#10039, hashicorp/nomad#1635, hashicorp/nomad#8368 + basic readme

ghost changed the title ~~[feature] Best effort for distinct hosts~~ [feature request] Best effort for distinct hosts Aug 22, 2016

dadgar added the stage/waiting-reply label Aug 22, 2016

dadgar changed the title ~~[feature request] Best effort for distinct hosts~~ Ability to rebalance allocation placements Aug 23, 2016

dadgar added type/enhancement theme/scheduling and removed stage/waiting-reply labels Aug 23, 2016

valodzka mentioned this issue Aug 13, 2021

Option to rebalance allocations when adding a node to the cluster #8368

Open

zhujinhe mentioned this issue Jul 22, 2022

feature request: parallelism parameter for resources with count hashicorp/terraform#14258

Open

tgross added the stage/needs-discussion label Aug 22, 2022

valodzka pushed a commit to valodzka/nomad-workarounds that referenced this issue Jan 9, 2023

add workaround around nomad issues hashicorp/nomad#3093, hashicorp/no…

1f42f71

…mad#698, hashicorp/nomad#10727, hashicorp/nomad#10039, hashicorp/nomad#1635, hashicorp/nomad#8368 + basic readme

lgfa29 mentioned this issue Apr 17, 2023

Reschedule workloads after dynamic node metadata update #16834

Closed

hashicorp locked as spam and limited conversation to collaborators May 20, 2024

jrasell closed this as not planned Won't fix, can't repro, duplicate, stale May 20, 2024

jrasell reopened this May 20, 2024

hashicorp unlocked this conversation May 20, 2024

hashicorp locked as spam and limited conversation to collaborators May 20, 2024

jrasell closed this as not planned Won't fix, can't repro, duplicate, stale May 20, 2024

jrasell reopened this May 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ability to rebalance allocation placements #1635

Ability to rebalance allocation placements #1635

ghost commented Aug 22, 2016 •

edited by ghost

Loading

dadgar commented Aug 22, 2016

ghost commented Aug 22, 2016 •

edited by ghost

Loading

dadgar commented Aug 23, 2016

jemc commented Mar 26, 2017 •

edited

Loading

discobean commented Jul 26, 2017

MDL-Cloud-Ops commented Sep 5, 2017

djenriquez commented Oct 31, 2017

daledude commented Jan 17, 2018

CumpsD commented Apr 13, 2018

alitvak69 commented Apr 30, 2018

hvindin commented May 29, 2018

dadgar commented Jun 4, 2018

jippi commented Jun 5, 2018

dadgar commented Jun 5, 2018

jippi commented Jun 5, 2018

suslovsergey commented Jul 12, 2019

KamilKeski commented Jul 15, 2019

langmartin commented Jul 16, 2019

pashinin commented Mar 12, 2020

idrennanvmware commented Jul 20, 2020

robbah commented Jun 22, 2022

tgross commented Jun 22, 2022

Ability to rebalance allocation placements #1635

Ability to rebalance allocation placements #1635

Comments

ghost commented Aug 22, 2016 • edited by ghost Loading

dadgar commented Aug 22, 2016

ghost commented Aug 22, 2016 • edited by ghost Loading

dadgar commented Aug 23, 2016

jemc commented Mar 26, 2017 • edited Loading

discobean commented Jul 26, 2017

MDL-Cloud-Ops commented Sep 5, 2017

djenriquez commented Oct 31, 2017

daledude commented Jan 17, 2018

CumpsD commented Apr 13, 2018

alitvak69 commented Apr 30, 2018

hvindin commented May 29, 2018

dadgar commented Jun 4, 2018

jippi commented Jun 5, 2018

dadgar commented Jun 5, 2018

jippi commented Jun 5, 2018

suslovsergey commented Jul 12, 2019

KamilKeski commented Jul 15, 2019

langmartin commented Jul 16, 2019

pashinin commented Mar 12, 2020

idrennanvmware commented Jul 20, 2020

robbah commented Jun 22, 2022

tgross commented Jun 22, 2022

ghost commented Aug 22, 2016 •

edited by ghost

Loading

ghost commented Aug 22, 2016 •

edited by ghost

Loading

jemc commented Mar 26, 2017 •

edited

Loading