You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was out on a drive yesterday and pulled over to do some research on my phone, looking into ETCD backup/restore solutions to make having just a single control-plane node more resilient in the cloud. This looks very possible using the etcdctl CLI. We could do a full backup to S3 (etc) every hour and log transactions in the meantime, so S3 should be very close to being up to date at all times.
Then if the cloud relocates the VM to a new host and there's a problem with the ETCD data (or it gets corrupted some other way), we could reload the ETCD data. We'd need to start/stop ETCD (and probably the API server) while we do this but this should only be for a minute or two and whatever is currently running on the cluster will still run, so most user facing services shouldn't see much impact.
We might need to do something similar when need to upgrade ETCD in the future. I did some reading about that too. ETCD does support upgrades but you need to install every version of ETCD between what you have and where you want to be eventually, so that's a pain. So the best approach might be to:
Some thoughts and links for these topics.
I was out on a drive yesterday and pulled over to do some research on my phone, looking into ETCD backup/restore solutions to make having just a single control-plane node more resilient in the cloud. This looks very possible using the etcdctl CLI. We could do a full backup to S3 (etc) every hour and log transactions in the meantime, so S3 should be very close to being up to date at all times.
Then if the cloud relocates the VM to a new host and there's a problem with the ETCD data (or it gets corrupted some other way), we could reload the ETCD data. We'd need to start/stop ETCD (and probably the API server) while we do this but this should only be for a minute or two and whatever is currently running on the cluster will still run, so most user facing services shouldn't see much impact.
We might need to do something similar when need to upgrade ETCD in the future. I did some reading about that too. ETCD does support upgrades but you need to install every version of ETCD between what you have and where you want to be eventually, so that's a pain. So the best approach might be to:
Here are some links discussing this:
https://goteleport.com/blog/kubernetes-and-offline-etcd-upgrades/
https://github.com/etcd-io/etcd/blob/main/etcdctl/README.md
The text was updated successfully, but these errors were encountered: