Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mark invalid snapshots as obsolete #720

Open
alexander-demicev opened this issue Sep 4, 2024 · 5 comments · May be fixed by #904
Open

Mark invalid snapshots as obsolete #720

alexander-demicev opened this issue Sep 4, 2024 · 5 comments · May be fixed by #904
Assignees
Labels
area/etcdsnapshot-restore Categorizes issue or PR as related to Turtles ETCD Snapshot & Restore feature kind/enhancement Categorizes issue or PR as related to a new feature.

Comments

@alexander-demicev
Copy link
Member

alexander-demicev commented Sep 4, 2024

We need to mark invalid as obsolete and prevent using obsolete snapshots when performing a restore. The snapshot is obsolete when:

  • cluster version is different from when the snapshot was made
  • cluster configuration changed(RKE2ControlPlane.Spec or RKE2ConfigTemplate.Spec)
  • cluster the snapshot was made for doesn't exist anymore
  • RKE2 control plane replicas didn't change
@alexander-demicev alexander-demicev added the area/etcdsnapshot-restore Categorizes issue or PR as related to Turtles ETCD Snapshot & Restore feature label Sep 4, 2024
@alexander-demicev alexander-demicev moved this to Backlog in CAPI / Turtles Sep 4, 2024
@alexander-demicev alexander-demicev moved this from Backlog to CAPI Backlog in CAPI / Turtles Sep 4, 2024
@alexander-demicev alexander-demicev self-assigned this Sep 5, 2024
@alexander-demicev alexander-demicev moved this from CAPI Backlog to In Progress (8 max) in CAPI / Turtles Sep 5, 2024
@alexander-demicev alexander-demicev moved this to CAPI Backlog in CAPI / Turtles Sep 5, 2024
@alexander-demicev alexander-demicev removed their assignment Sep 5, 2024
@vatsalparekh vatsalparekh self-assigned this Nov 11, 2024
@kkaempf kkaempf added the kind/enhancement Categorizes issue or PR as related to a new feature. label Nov 12, 2024
@Danil-Grigorev
Copy link
Contributor

I think last point is already fixed by implementing #841 which maintains up-to-date list of snapshots.

  • cluster the snapshot was made for doesn't exist anymore

@alexander-demicev alexander-demicev moved this from CAPI Backlog to In Progress (8 max) in CAPI / Turtles Nov 29, 2024
@alexander-demicev alexander-demicev moved this from In Progress (8 max) to CAPI Backlog in CAPI / Turtles Jan 21, 2025
@vatsalparekh vatsalparekh linked a pull request Jan 23, 2025 that will close this issue
4 tasks
@yiannistri yiannistri moved this from Team Backlog to In Progress (8 max) in CAPI / Turtles Feb 17, 2025
@yiannistri yiannistri self-assigned this Feb 17, 2025
@yiannistri
Copy link
Contributor

yiannistri commented Feb 17, 2025

I have tested the following scenario:

Deploy cluster at version 1.31 -> Take S3 snapshots -> Update cluster to 1.32 -> Restore S3 snapshots from 1.31: In this case, the restore seems to have no effect as the control planes appear to be running in different versions for a short period of time but eventually the nodes running the previous version become NotReady, possibly because the CAPI controllers continue to reconcile the cluster to the desired version (1.32):
Before restore

kubectl --kubeconfig rke2.kubeconfig get nodes
NAME                       STATUS   ROLES                       AGE     VERSION
rke2-control-plane-2kjpf   Ready    control-plane,etcd,master   7m42s   v1.32.0+rke2r1
rke2-md-0-sndmg-x9cdk      Ready    <none>                      3m11s   v1.32.0+rke2r1

After restore

kubectl --kubeconfig rke2.kubeconfig get nodes
NAME                       STATUS   ROLES                       AGE   VERSION
rke2-control-plane-2kjpf   Ready    control-plane,etcd,master   44s   v1.32.0+rke2r1
rke2-control-plane-s9t5j   Ready    control-plane,etcd,master   24m   v1.31.0+rke2r1
rke2-md-0-68s2w-rg427      Ready    <none>                      23m   v1.31.0+rke2r1
kubectl --kubeconfig rke2.kubeconfig get nodes
NAME                       STATUS     ROLES                       AGE     VERSION
rke2-control-plane-2kjpf   Ready      control-plane,etcd,master   2m14s   v1.32.0+rke2r1
rke2-control-plane-s9t5j   NotReady   control-plane,etcd,master   26m     v1.31.0+rke2r1
rke2-md-0-68s2w-rg427      NotReady   <none>                      24m     v1.31.0+rke2r1

It's worth noting that currently we only support creation of local snapshots from the management cluster. So attempting to restore S3 snapshots manually (as I've done above) and not through the management cluster may be out-of-scope for the time being.

@yiannistri
Copy link
Contributor

yiannistri commented Feb 17, 2025

Another scenario that I've tested:

Deploy cluster at version 1.31 -> Take local snapshot from mgmt cluster via the ETCDMachineSnapshot resource -> Restore snapshot from mgmt cluster via the ETCDSnapshotRestore resource: This time snapshot restore worked as expected and the cluster was fully operational after restore.

@yiannistri
Copy link
Contributor

In general, since the current implementation of ETCD snapshot/restore in Turtles only supports local snapshots, it doesn't make a lot of sense to test the scenarios above:

  • Updating the version of Kubernetes results in a rollout update that creates new nodes and deletes old CP nodes (and therefore any local snapshots
  • Without S3 support, it's hard to restore a snapshot from a previous cluster into a new cluster

For the reasons above, it probably makes sense to close this issue and create a new one once we have S3 support implemented, wdyt @alexander-demicev ?

@yiannistri yiannistri moved this from In Progress (8 max) to Team Backlog in CAPI / Turtles Feb 20, 2025
@yiannistri yiannistri removed their assignment Feb 20, 2025
@yiannistri
Copy link
Contributor

Moving back to backlog until we make a firm decision on the future of this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/etcdsnapshot-restore Categorizes issue or PR as related to Turtles ETCD Snapshot & Restore feature kind/enhancement Categorizes issue or PR as related to a new feature.
Projects
Status: Team Backlog
Development

Successfully merging a pull request may close this issue.

5 participants