Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drain nodes when they terminate #7119

Closed
grosser opened this issue Jun 7, 2019 · 39 comments
Closed

Drain nodes when they terminate #7119

grosser opened this issue Jun 7, 2019 · 39 comments
Labels
good first issue Denotes an issue ready for a new contributor, according to the "help wanted" guidelines. hacktoberfest Issues that are good to work on, or people are working on, for hacktoberfest lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness.

Comments

@grosser
Copy link

grosser commented Jun 7, 2019

When the ASG scales down the node should be drained and not just terminated (killing pods / not respecting poddisruptionbudget)

Either via "Amazon EC2 Auto Scaling Lifecycle Hooks" (up to 60 min) or with a termination script https://github.com/miglen/aws/blob/master/ec2/run-script-on-ec2-instance-termination.md (2 min max, so will be tight)

Flow could be Scale-Down -> SQS -> Drainer or Scale-Down -> SQS -> Node status -> https://github.com/planetlabs/draino

We might be able to contribute this, but need some "yes that's a good idea" / "yes we want this" first :)

@pracucci
Copy link

pracucci commented Jun 7, 2019

I would personally suggest with the easiest approach, which looks the termination script and eventually iterate on it over the time.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 5, 2019
@grosser
Copy link
Author

grosser commented Sep 5, 2019 via email

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 5, 2019
@so0k
Copy link
Contributor

so0k commented Sep 11, 2019

I did a detailed write up on setting up LCH / SQS / node-drainer here

https://mofoseng.github.io/posts/eks-migration/#node-maintenance-autoscaling-and-lifecyclehooks

I compared kube-aws drainer (which has a single replica deployment updating a configmap and ds on each node grepping the configmap... ) to a pure AWS lambda based approach and this SQS approach seemed the most robust

it's not kops specific but it may help adoption (I moved company and now I'm back on kops after migrating out of it at my last company)

@grosser
Copy link
Author

grosser commented Sep 11, 2019 via email

@so0k
Copy link
Contributor

so0k commented Sep 13, 2019

@grosser - how did you manage LCH creation for kops currently? do you generate TF and write config around that? Did you fork kops to add that functionality?

@grosser
Copy link
Author

grosser commented Sep 13, 2019 via email

@so0k
Copy link
Contributor

so0k commented Sep 13, 2019

sorry for off-topic

I'm also trying to taint spot instances in a mixedInstancePolicy - for this I see 2 approaches (without modifying kops):

  1. A Kubelet systemd service unit drop-in as a hook or an asset which adds --register-with-taints / --node-labels to the $DAEMON_ARGS based on aws ec2 describe-instances --instance-ids ${iid} --query 'Reservations[0].Instances[0].InstanceLifecycle' output
  2. A DS similar iameli/kube-node-labeller to set taints and labels

The difference is that option 1 would ensure taints are set before node registration, option 2 would take effect at some point in time later (maybe after some workloads which should not tolerate the taint have already started running on the tainted node...)

with EKS bootstrap it is quite simple to ensure nodes are labeled / tainted properly before they register with the API (and kops only supports labels/taints at the iG level, not at the per node level)

@so0k
Copy link
Contributor

so0k commented Sep 17, 2019

For now we implemented option 2: https://github.com/compareasiagroup/kube-node-lifecycle-labeller

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 16, 2019
@grosser
Copy link
Author

grosser commented Dec 16, 2019

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 16, 2019
@kforsthoevel
Copy link

Another way of handling node draining on termination taken from Zalando's kubernetes-on-aws: https://github.com/zalando-incubator/kubernetes-on-aws/blob/449f8f3bf5c60e0d319be538460ff91266337abc/cluster/userdata-worker.yaml#L92-L120

@kforsthoevel
Copy link

I have just implemented node drain via systemd. The systemd unit gets provisioned via kops hooks. The kubeconfg is written to disk by a daemonset.

Works like a charm. Thanks @thomaspeitz for your support.

@paalkr
Copy link

paalkr commented Mar 1, 2020

@kforsthoevel , that sounds very interesting. Do you mind sharing the implementation details?

@kforsthoevel
Copy link

@paalkr I will write a little blog post about it and let you know.

@johngmyers
Copy link
Member

The kops hook I saw by following links was a good proof of concept, but it had the problem that it assumed the container runtime was Docker.

@kforsthoevel
Copy link

@paalkr Here is the blog post: https://tech.ivx.com/how-to-drain-nodes-before-they-get-terminated-by-aws

@paalkr
Copy link

paalkr commented Mar 6, 2020

Thx!

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 4, 2020
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jul 4, 2020
@johngmyers
Copy link
Member

/lifecycle frozen

@k8s-ci-robot k8s-ci-robot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. labels Jul 4, 2020
@johngmyers johngmyers added the good first issue Denotes an issue ready for a new contributor, according to the "help wanted" guidelines. label Sep 11, 2020
@justinsb justinsb added the hacktoberfest Issues that are good to work on, or people are working on, for hacktoberfest label Oct 1, 2020
@bwagner5
Copy link
Contributor

I've chatted with @olemarkus on Slack about adding aws-node-termination-handler (NTH) as an add-on for kops.

NTH operates in two different modes: IMDS Processor (old-way) and Queue-Processor. I think the queue-processor mode would be a good fit for kops since it's more resilient and can respond to more types of events (spot ITNs, spot rebalance recommendations, ec2 instance status changes, ASG termination lifecycle hooks, and more coming).

NTH queue-processor mode work by listening to an SQS queue that is sourced with events from Amazon EventBridge. kops can do a lot of the heavy lifting of setting up the Amazon EventBridge rules appropriately and creating an SQS queue.

@paalkr
Copy link

paalkr commented Nov 30, 2020

Sounds good. I have used CloudFormation to create the EventBridge rules and SQS queue for now, but integrating closer into Kops would be a nice addition.

@paalkr
Copy link

paalkr commented Nov 30, 2020

BTW, I ended up using the kubernetes.io/cluster/<cluster-name> as the managedAsgTag, as this label is already created and managed by Kops ;)

Ref: aws/aws-node-termination-handler#272

@olemarkus
Copy link
Member

So first step would be to add support for provisioning SQS + EventBridge rules. Then have cloudup provision those if NTH is enabled.

There is a number of changes to the template that is needed too. I think it is probably best to just have two separate templates and pick one based on which mode NTH is set to use. I would run the NTH deployment on masters and use the master IAM role for authenticating to SQS.

@paalkr
Copy link

paalkr commented Dec 1, 2020

Also a AGS lifecycle hook need to be added to each ASG, so that nodes stays in the ASG until NTH has finished draining the node and can send a AGS lifecycle continue signal. And the behavior of NTH needs to be coordinated with kops rolling-update cluster commands. Which component is responsible for sending the AGS lifecycle continue signal during rolling update? NTH or Kops? And both Kops and NTH will also try to drain nodes during rolling updates.

@olemarkus
Copy link
Member

I think kops should just let NTH do drain and terminate.

I can see a problem when using --cloudonly though. In this case, I think kops can just send the signal immediately.

But we can implement some of this also without the ASG lifecycle hooks, and adding things in increment may be a benefit here.

It looks like you both have a much better overview on how to approach this than I do. I can definitely help out with how to add the various bits.

Would any of you be able to try your hand at a PR for (parts of) this?

@bwagner5
Copy link
Contributor

bwagner5 commented Dec 1, 2020

I can try to get something out. I'm not too familiar with the kops codebase, so I'll need some help as well. I'll be on vacation much of December, so probably can't do anything soon. But hopefully we can hammer out a plan and make sure this is going to work nicely.

How do the other cloud providers work for draining? I was going to say that if NTH is the termination handling component, then the kops controller should just let NTH do the draining, but I'm not sure that's really possible since the kops controller would still need to finish the drain in the other providers. It's probably not the end of the world if both drain since the eviction api calls are idempotent. Whoever completes the ASG lifecycle hook first would win, but it's kind of awkward.

@olemarkus
Copy link
Member

Sounds good!

The code that drains a node can be found here: https://github.com/kubernetes/kops/blob/master/pkg/instancegroups/instancegroups.go#L328
This has access to the cluster spec, so you can skip call if NTH is enabled and jump straight to deleteNode. Our validation logic prevents NTH from being enabled on other clouds, so you don't have to worry about those.

One bit in the code linked above that you need to take into consideration is that it deletes the k8s node object. That bit also needs to be skipped. I assume NTH also does that on its own.

For the AWS provisioning piece, see https://github.com/kubernetes/kops/tree/master/pkg/model
I would assume you need a task for EventBrige and one for SQS.

@paalkr
Copy link

paalkr commented Dec 2, 2020

Unfortunately I'm not a developer, but I can contribute with testing and discussing design and implementation specs in general.

@johngmyers
Copy link
Member

Since draining is idempotent, it is not necessary to disable the drain code in rolling update. It is advantageous to do the bulk of the draining from rolling update as that makes what is going on more visible in the rolling update logs.

@olemarkus
Copy link
Member

@bwagner5 had any time to look into this yet?

@haugenj
Copy link
Contributor

haugenj commented Jan 28, 2021

👋 hey @olemarkus I'm going to take this over from @bwagner5 as he's caught up with some other work for now. If I have questions where's the best place to ask them? Here, slack, maybe a incomplete/draft pr?

@olemarkus
Copy link
Member

Hey @haugenj. Any way you like. I can imagine slack is easiest in the beginning, and then a draft PR once you have a rough implementation.

@olemarkus
Copy link
Member

Hi @haugenj. Have you had the chance to look into this yet? Anything I can help out with?

@haugenj
Copy link
Contributor

haugenj commented Mar 2, 2021

@olemarkus yeah, I've got the SQS provisioning done but I'm still working on the Eventbridge rules. Once I've got those done I'll open a draft pr, hopefully by end of this week 🤞

@SD-13
Copy link

SD-13 commented Oct 27, 2023

Seems like #10995 resolved this issue. Can we close this?

@grosser
Copy link
Author

grosser commented Oct 27, 2023

yeah thx, that looks like it solves it :)

@grosser grosser closed this as completed Oct 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Denotes an issue ready for a new contributor, according to the "help wanted" guidelines. hacktoberfest Issues that are good to work on, or people are working on, for hacktoberfest lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness.
Projects
None yet
Development

No branches or pull requests