Skip to content
This repository has been archived by the owner on Oct 16, 2020. It is now read-only.

locksmith: CoreOS autoupdate & Kubernetes node drain (klocksmith) #1274

Closed
skinny opened this issue May 10, 2016 · 16 comments
Closed

locksmith: CoreOS autoupdate & Kubernetes node drain (klocksmith) #1274

skinny opened this issue May 10, 2016 · 16 comments

Comments

@skinny
Copy link

skinny commented May 10, 2016

When running certain multi-pod applications (Redis cluster in our case) sometimes during an CoreOS update run (installing & rebooting every node) the majority or all pods (3 in our example) end up one one physical machine. When that machine is rebooted, the Redis cluster is lost and requires (for now) manual intervention to get back up.

I learned that Kubernetes 1.2 introduced the Node-Drain functionality, this would be a great feature to use before rebooting a Kubernetes enabled CoreOS node.

Are there any plans on implenting this kind of behaviour (relocating all the pods before a reboot) or does anyone know another way of avoiding this kind of scenario.

Mark

@philips philips changed the title CoreOS autoupdate & Kubernetes node drain locksmith: CoreOS autoupdate & Kubernetes node drain May 10, 2016
@philips
Copy link

philips commented May 10, 2016

Hey Mark! I have been wanting to write a design doc on this. Here is a first draft: https://docs.google.com/document/d/1DHiB2UDBYRU6QSa2e9mCNla1qBivZDqYjBVn_DvzDWc/edit#

@skinny
Copy link
Author

skinny commented May 10, 2016

Hi, thanks for the quick response!

I read your draft and wondering about the need for the second "update-manager" pod. Wouldn't that introduce more issues, for example when that pod is scheduled on the node that needs rebooting?
_scrap that _

Also on Loop 1, step 2: you meant to tag no more than N nodes with the ok-to-reboot tag?

Mark

PS if you are still in Berlin, maybe we can have a quick chat?

@philips
Copy link

philips commented May 10, 2016

@skinny Happy to chat. I am in Berlin until Saturday.

@philips
Copy link

philips commented Jun 25, 2016

@skinny Still interested in working on this?

@chrissnell
Copy link

+1 for this!

@yogurtnaturalny
Copy link

Hi, this idea is very cool, like locksmith will update k8s to evacuate containers. Something like coreos to etcd "I wanna restart", etcd "ok, hold on i will inform k8s" etc to k8 "hey node 8 want to restart mark as not for deploy and rollupdate/restart containers", k8s "Yes my master, done ", etc ->coreos "restart". and coreos will report that he come back to cluster and etcd will unmark not schedule from node.

@yogurtnaturalny
Copy link

yogurtnaturalny commented Sep 11, 2016

This will help not only coreos with locksmith but other distros to schedule updates on infrastructure. [VPS and bare metal] and decrease down time for service, etc...

@philips
Copy link

philips commented Sep 28, 2016

We should probably not try and use the existing locksmith codebase and instead call this "klocksmith" or something. The deployment method (containers), backend (kubernetes), etc are all completely different here.

@philips philips changed the title locksmith: CoreOS autoupdate & Kubernetes node drain locksmith: CoreOS autoupdate & Kubernetes node drain (klocksmith) Sep 28, 2016
@snarlysodboxer
Copy link

snarlysodboxer commented Oct 22, 2016

@philips I read through your Doc and it looks like a great idea. I'll throw this though out here just in case. Forgive me if I'm missing part of the picture.

What about just modifying locksmith itself to support preStop hooks? It could optionally run a command or httpGet a URL, blocking the reboot signal until that command or URL returns?

The command could obviously then be anybody's custom anything, and for the case of K8s, the command could be a simple bash script which runs kubectl drain <node name> and loops until Non-terminated Pods == 0, or something more sophisticated that hits the API directly.

@so0k
Copy link

so0k commented Nov 3, 2016

Adding preStop hooks seems like a simple/quick solution to the problem at hand?

@chrissnell
Copy link

I would also like to be able to prevent a node from rebooting if a ReplicaSet or StatefulSet is not running the desired number of replicas. This is to prevent downtime for an application (like some data stores) that requires a minimum number of nodes to be running.

The scenario goes like this: Let's say that you're running something like ElasticSearch in a StatefulSet. let's say that one the SS's pods experiences a fatal event, like database or disk corruption, free space exhaustion, etc., and fails a liveness probe. It's broken and won't come up w/o manual intervention. The SS is now running with less than the desired number of replicas. If locksmith were to initiate a reboot on a node running a pod from this StatefulSet, this could compromise application/cluster availability.

We should be able to prevent node reboots when there is a compromised ReplicaSet or StatefulSet. Maybe there's a way to do this already? I don't know, but this seems an appropriate place to mention it. We're working around this very same situation w/ Cassandra running under Fleet (obviously less than ideal).

@sander-su
Copy link

+1 looks great, currently our cluster reboots entirely way too fast. No time for the applications to become available again. As this currently happens during nighttime this is not that big of a problem but could be better.

@chrissnell for compromised replica & statefullsets the PodDisruptionBudget would be the indicator.

Can someone comment on the current status?

@crawford
Copy link
Contributor

We are working on a kubernetes-aware version of locksmith (lovingly called "klocksmith"). The plan is to deploy this component (consisting of a daemon set and controller) onto the cluster and allow that to manage the reboots. We don't have anything to announce just yet, but we are getting close.

@crawford
Copy link
Contributor

For those following along, we released https://github.com/coreos/container-linux-update-operator which replaces Locksmith in Kubernetes clusters.

@euank
Copy link
Contributor

euank commented Aug 15, 2017

The Container Linux Update Operator (the new name for "klocksmith") is now deployed by default on Tectonic clusters. It should also function just as well on regular Kubernetes clusters.

For specific enhancements related to it, please open additional issues here, against Tectonic, or against Kubernetes as appropriate.


@chrissnell
Enforcing a minimum health of a statefulset/deployment will best be accomplished by the pod disruption budget feature I think... which is still in development, but once it's available the update operator should use it.

@euank euank closed this as completed Aug 15, 2017
@dghubble
Copy link
Member

Yep, for example, plain-old Kubernetes clusters like the Matchbox bootkube-install example cluster (noo-tectonic) now use the Container Linux Update Operator too.

https://github.com/coreos/matchbox/blob/master/Documentation/cluster-addons.md

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

10 participants