Drain nodes when they terminate #7119

grosser · 2019-06-07T16:05:59Z

When the ASG scales down the node should be drained and not just terminated (killing pods / not respecting poddisruptionbudget)

Either via "Amazon EC2 Auto Scaling Lifecycle Hooks" (up to 60 min) or with a termination script https://github.com/miglen/aws/blob/master/ec2/run-script-on-ec2-instance-termination.md (2 min max, so will be tight)

Flow could be Scale-Down -> SQS -> Drainer or Scale-Down -> SQS -> Node status -> https://github.com/planetlabs/draino

We might be able to contribute this, but need some "yes that's a good idea" / "yes we want this" first :)

pracucci · 2019-06-07T16:32:09Z

I would personally suggest with the easiest approach, which looks the termination script and eventually iterate on it over the time.

fejta-bot · 2019-09-05T17:26:07Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

grosser · 2019-09-05T20:46:25Z

/remove-lifecycle stale

…

On Thu, Sep 5, 2019 at 10:26 AM fejta-bot ***@***.***> wrote: Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close. If this issue is safe to close now please do so with /close. Send feedback to sig-testing, kubernetes/test-infra and/or fejta <https://github.com/fejta>. /lifecycle stale — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#7119?email_source=notifications&email_token=AAACYZ7OEMEBFG2QE6QQ2NDQIE6MJA5CNFSM4HVYMZNKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6ABAWQ#issuecomment-528486490>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAACYZ5WORU3IMVVNVPFKT3QIE6MJANCNFSM4HVYMZNA> .

so0k · 2019-09-11T03:28:41Z

I did a detailed write up on setting up LCH / SQS / node-drainer here

https://mofoseng.github.io/posts/eks-migration/#node-maintenance-autoscaling-and-lifecyclehooks

I compared kube-aws drainer (which has a single replica deployment updating a configmap and ds on each node grepping the configmap... ) to a pure AWS lambda based approach and this SQS approach seemed the most robust

it's not kops specific but it may help adoption (I moved company and now I'm back on kops after migrating out of it at my last company)

grosser · 2019-09-11T22:06:42Z

FYI We use a similar workflow but instead of a configmap set a label on the node to mark it as draining ... Also looking at moving to a single deployment to have less pods flying around.

…

On Tue, Sep 10, 2019, 8:29 PM so0k ***@***.***> wrote: I did a detailed write up on setting up LCH / SQS / node-drainer here https://mofoseng.github.io/posts/eks-migration/#node-maintenance-autoscaling-and-lifecyclehooks I compared kube-aws drainer (which has a single replica deployment updating a configmap and ds on each node grepping the configmap... ) to a pure AWS lambda based approach and this SQS approach seemed the most robust it's not kops specific but it may help adoption (I moved company and now I'm back on kops after migrating out of it at my last company) — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#7119?email_source=notifications&email_token=AAACYZ3LZ4NMWBVEGCUL2BDQJBQYJA5CNFSM4HVYMZNKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6NEM2Y#issuecomment-530204267>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAACYZ62FXSQSK2VO333W7TQJBQYJANCNFSM4HVYMZNA> .

so0k · 2019-09-13T03:01:01Z

@grosser - how did you manage LCH creation for kops currently? do you generate TF and write config around that? Did you fork kops to add that functionality?

grosser · 2019-09-13T03:03:37Z

we are not doing that for kops yet, just our old clusters that we create with cloudformation, still in the progress of figuring that out :(

…

On Thu, Sep 12, 2019 at 8:01 PM so0k ***@***.***> wrote: @grosser <https://github.com/grosser> - how did you manage LCH creation for kops currently? do you generate TF and write config around that? Did you fork kops to add that functionality? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#7119?email_source=notifications&email_token=AAACYZ3L2FVN4SSHFTDBNR3QJL7A3A5CNFSM4HVYMZNKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6T2LUQ#issuecomment-531080658>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAACYZ2PPCAIFGCTUDWQNULQJL7A3ANCNFSM4HVYMZNA> .

so0k · 2019-09-13T03:11:15Z

sorry for off-topic

I'm also trying to taint spot instances in a mixedInstancePolicy - for this I see 2 approaches (without modifying kops):

A Kubelet systemd service unit drop-in as a hook or an asset which adds --register-with-taints / --node-labels to the $DAEMON_ARGS based on aws ec2 describe-instances --instance-ids ${iid} --query 'Reservations[0].Instances[0].InstanceLifecycle' output
A DS similar iameli/kube-node-labeller to set taints and labels

The difference is that option 1 would ensure taints are set before node registration, option 2 would take effect at some point in time later (maybe after some workloads which should not tolerate the taint have already started running on the tainted node...)

with EKS bootstrap it is quite simple to ensure nodes are labeled / tainted properly before they register with the API (and kops only supports labels/taints at the iG level, not at the per node level)

so0k · 2019-09-17T11:07:05Z

For now we implemented option 2: https://github.com/compareasiagroup/kube-node-lifecycle-labeller

fejta-bot · 2019-12-16T11:09:22Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

grosser · 2019-12-16T14:32:06Z

/remove-lifecycle stale

kforsthoevel · 2020-02-13T15:12:59Z

Another way of handling node draining on termination taken from Zalando's kubernetes-on-aws: https://github.com/zalando-incubator/kubernetes-on-aws/blob/449f8f3bf5c60e0d319be538460ff91266337abc/cluster/userdata-worker.yaml#L92-L120

kforsthoevel · 2020-02-21T13:40:43Z

I have just implemented node drain via systemd. The systemd unit gets provisioned via kops hooks. The kubeconfg is written to disk by a daemonset.

Works like a charm. Thanks @thomaspeitz for your support.

paalkr · 2020-03-01T08:05:25Z

@kforsthoevel , that sounds very interesting. Do you mind sharing the implementation details?

kforsthoevel · 2020-03-02T12:51:34Z

@paalkr I will write a little blog post about it and let you know.

johngmyers · 2020-03-02T15:05:02Z

The kops hook I saw by following links was a good proof of concept, but it had the problem that it assumed the container runtime was Docker.

kforsthoevel · 2020-03-06T12:27:32Z

@paalkr Here is the blog post: https://tech.ivx.com/how-to-drain-nodes-before-they-get-terminated-by-aws

paalkr · 2020-03-06T13:07:25Z

Thx!

fejta-bot · 2020-06-04T14:00:34Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot · 2020-07-04T14:42:39Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

johngmyers · 2020-07-04T16:02:00Z

/lifecycle frozen

randomvariable · 2020-10-01T13:37:52Z

Related Cluster API issues:

And things we may want to look at:

bwagner5 · 2020-11-30T22:00:55Z

I've chatted with @olemarkus on Slack about adding aws-node-termination-handler (NTH) as an add-on for kops.

NTH operates in two different modes: IMDS Processor (old-way) and Queue-Processor. I think the queue-processor mode would be a good fit for kops since it's more resilient and can respond to more types of events (spot ITNs, spot rebalance recommendations, ec2 instance status changes, ASG termination lifecycle hooks, and more coming).

NTH queue-processor mode work by listening to an SQS queue that is sourced with events from Amazon EventBridge. kops can do a lot of the heavy lifting of setting up the Amazon EventBridge rules appropriately and creating an SQS queue.

paalkr · 2020-11-30T22:06:14Z

Sounds good. I have used CloudFormation to create the EventBridge rules and SQS queue for now, but integrating closer into Kops would be a nice addition.

paalkr · 2020-11-30T22:08:51Z

BTW, I ended up using the kubernetes.io/cluster/<cluster-name> as the managedAsgTag, as this label is already created and managed by Kops ;)

Ref: aws/aws-node-termination-handler#272

olemarkus · 2020-12-01T08:32:45Z

So first step would be to add support for provisioning SQS + EventBridge rules. Then have cloudup provision those if NTH is enabled.

There is a number of changes to the template that is needed too. I think it is probably best to just have two separate templates and pick one based on which mode NTH is set to use. I would run the NTH deployment on masters and use the master IAM role for authenticating to SQS.

paalkr · 2020-12-01T10:12:47Z

Also a AGS lifecycle hook need to be added to each ASG, so that nodes stays in the ASG until NTH has finished draining the node and can send a AGS lifecycle continue signal. And the behavior of NTH needs to be coordinated with kops rolling-update cluster commands. Which component is responsible for sending the AGS lifecycle continue signal during rolling update? NTH or Kops? And both Kops and NTH will also try to drain nodes during rolling updates.

olemarkus · 2020-12-01T13:06:05Z

I think kops should just let NTH do drain and terminate.

I can see a problem when using --cloudonly though. In this case, I think kops can just send the signal immediately.

But we can implement some of this also without the ASG lifecycle hooks, and adding things in increment may be a benefit here.

It looks like you both have a much better overview on how to approach this than I do. I can definitely help out with how to add the various bits.

Would any of you be able to try your hand at a PR for (parts of) this?

bwagner5 · 2020-12-01T18:04:50Z

I can try to get something out. I'm not too familiar with the kops codebase, so I'll need some help as well. I'll be on vacation much of December, so probably can't do anything soon. But hopefully we can hammer out a plan and make sure this is going to work nicely.

How do the other cloud providers work for draining? I was going to say that if NTH is the termination handling component, then the kops controller should just let NTH do the draining, but I'm not sure that's really possible since the kops controller would still need to finish the drain in the other providers. It's probably not the end of the world if both drain since the eviction api calls are idempotent. Whoever completes the ASG lifecycle hook first would win, but it's kind of awkward.

olemarkus · 2020-12-01T19:00:20Z

Sounds good!

The code that drains a node can be found here: https://github.com/kubernetes/kops/blob/master/pkg/instancegroups/instancegroups.go#L328
This has access to the cluster spec, so you can skip call if NTH is enabled and jump straight to deleteNode. Our validation logic prevents NTH from being enabled on other clouds, so you don't have to worry about those.

One bit in the code linked above that you need to take into consideration is that it deletes the k8s node object. That bit also needs to be skipped. I assume NTH also does that on its own.

For the AWS provisioning piece, see https://github.com/kubernetes/kops/tree/master/pkg/model
I would assume you need a task for EventBrige and one for SQS.

paalkr · 2020-12-02T10:33:47Z

Unfortunately I'm not a developer, but I can contribute with testing and discussing design and implementation specs in general.

johngmyers · 2020-12-04T07:43:22Z

Since draining is idempotent, it is not necessary to disable the drain code in rolling update. It is advantageous to do the bulk of the draining from rolling update as that makes what is going on more visible in the rolling update logs.

olemarkus · 2021-01-28T09:38:03Z

@bwagner5 had any time to look into this yet?

haugenj · 2021-01-28T17:53:16Z

👋 hey @olemarkus I'm going to take this over from @bwagner5 as he's caught up with some other work for now. If I have questions where's the best place to ask them? Here, slack, maybe a incomplete/draft pr?

olemarkus · 2021-01-28T18:15:00Z

Hey @haugenj. Any way you like. I can imagine slack is easiest in the beginning, and then a draft PR once you have a rough implementation.

olemarkus · 2021-03-02T10:56:36Z

Hi @haugenj. Have you had the chance to look into this yet? Anything I can help out with?

haugenj · 2021-03-02T15:07:14Z

@olemarkus yeah, I've got the SQS provisioning done but I'm still working on the Eventbridge rules. Once I've got those done I'll open a draft pr, hopefully by end of this week 🤞

SD-13 · 2023-10-27T19:06:26Z

Seems like #10995 resolved this issue. Can we close this?

grosser · 2023-10-27T22:02:32Z

yeah thx, that looks like it solves it :)

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 5, 2019

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 5, 2019

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 16, 2019

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 16, 2019

johngmyers mentioned this issue Feb 19, 2020

Cluster Rolling Update support the --ignore-daemonsets flag #8391

Closed

mmerrill3 mentioned this issue Feb 25, 2020

Enabling the ability to drain daemon set pods, with priority #8619

Closed

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 4, 2020

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jul 4, 2020

k8s-ci-robot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. labels Jul 4, 2020

johngmyers mentioned this issue Jul 4, 2020

Reference - kops/docs/instance_groups.md #9486

Closed

johngmyers added the good first issue Denotes an issue ready for a new contributor, according to the "help wanted" guidelines. label Sep 11, 2020

justinsb added the hacktoberfest Issues that are good to work on, or people are working on, for hacktoberfest label Oct 1, 2020

olemarkus mentioned this issue Dec 1, 2020

Add support for defining/using AWS ASG Lifecycle Hooks #8708

Closed

haugenj mentioned this issue Mar 8, 2021

Add NTH Queue Processor Mode #10995

Merged

3 tasks

grosser closed this as completed Oct 27, 2023

Drain nodes when they terminate #7119

Drain nodes when they terminate #7119

Comments

grosser commented Jun 7, 2019 • edited Loading

pracucci commented Jun 7, 2019

fejta-bot commented Sep 5, 2019

grosser commented Sep 5, 2019 via email

so0k commented Sep 11, 2019

grosser commented Sep 11, 2019 via email

so0k commented Sep 13, 2019

grosser commented Sep 13, 2019 via email

so0k commented Sep 13, 2019 • edited Loading

so0k commented Sep 17, 2019

fejta-bot commented Dec 16, 2019

grosser commented Dec 16, 2019

kforsthoevel commented Feb 13, 2020

kforsthoevel commented Feb 21, 2020

paalkr commented Mar 1, 2020

kforsthoevel commented Mar 2, 2020

johngmyers commented Mar 2, 2020

kforsthoevel commented Mar 6, 2020

paalkr commented Mar 6, 2020

fejta-bot commented Jun 4, 2020

fejta-bot commented Jul 4, 2020

johngmyers commented Jul 4, 2020

randomvariable commented Oct 1, 2020

bwagner5 commented Nov 30, 2020

paalkr commented Nov 30, 2020 • edited Loading

paalkr commented Nov 30, 2020 • edited Loading

olemarkus commented Dec 1, 2020

paalkr commented Dec 1, 2020 • edited Loading

olemarkus commented Dec 1, 2020

bwagner5 commented Dec 1, 2020

olemarkus commented Dec 1, 2020

paalkr commented Dec 2, 2020

johngmyers commented Dec 4, 2020

olemarkus commented Jan 28, 2021

haugenj commented Jan 28, 2021

olemarkus commented Jan 28, 2021

olemarkus commented Mar 2, 2021

haugenj commented Mar 2, 2021

SD-13 commented Oct 27, 2023

grosser commented Oct 27, 2023

grosser commented Jun 7, 2019 •

edited

Loading

so0k commented Sep 13, 2019 •

edited

Loading

paalkr commented Nov 30, 2020 •

edited

Loading

paalkr commented Nov 30, 2020 •

edited

Loading

paalkr commented Dec 1, 2020 •

edited

Loading