Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ETCD backup/restore & cluster upgrade #1805

Open
jefflill opened this issue Jun 20, 2023 · 0 comments
Open

ETCD backup/restore & cluster upgrade #1805

jefflill opened this issue Jun 20, 2023 · 0 comments
Labels
neon-kube Related to our Kubernetes distribution

Comments

@jefflill
Copy link
Collaborator

Some thoughts and links for these topics.


I was out on a drive yesterday and pulled over to do some research on my phone, looking into ETCD backup/restore solutions to make having just a single control-plane node more resilient in the cloud. This looks very possible using the etcdctl CLI. We could do a full backup to S3 (etc) every hour and log transactions in the meantime, so S3 should be very close to being up to date at all times.

Then if the cloud relocates the VM to a new host and there's a problem with the ETCD data (or it gets corrupted some other way), we could reload the ETCD data. We'd need to start/stop ETCD (and probably the API server) while we do this but this should only be for a minute or two and whatever is currently running on the cluster will still run, so most user facing services shouldn't see much impact.

We might need to do something similar when need to upgrade ETCD in the future. I did some reading about that too. ETCD does support upgrades but you need to install every version of ETCD between what you have and where you want to be eventually, so that's a pain. So the best approach might be to:

  1. shutdown the API servers on all masters
  2. backup ETCD on each of the masters
  3. upgrade ETCD with no data
  4. restore the backup
  5. restart the API servers

Here are some links discussing this:

https://goteleport.com/blog/kubernetes-and-offline-etcd-upgrades/
https://github.com/etcd-io/etcd/blob/main/etcdctl/README.md

@jefflill jefflill added the neon-kube Related to our Kubernetes distribution label Jun 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
neon-kube Related to our Kubernetes distribution
Projects
None yet
Development

No branches or pull requests

1 participant