Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Create/Restore Cluster Snapshots #160

Open
cfontes opened this issue Jan 2, 2020 · 8 comments
Open

[Feature] Create/Restore Cluster Snapshots #160

cfontes opened this issue Jan 2, 2020 · 8 comments
Labels
difficulty/high enhancement New feature or request help wanted Extra attention is needed
Milestone

Comments

@cfontes
Copy link

cfontes commented Jan 2, 2020

Scope of your request

Be able to create snapshots for complex clusters and restore them at will

I think this is very useful for clusters with stateful sets that take long to be created, in my case my Local Kafka + Zookeeper cluster takes around 10 minutes to be fully configured and populated, but I only need to do that once every couple months.

Describe the solution you'd like

This project is extremely helpful, I opted to use it instead of plain k3s because I saw a possibility to use docker commit as a snapshot tool, so I could iterate fast.
In case I break something I don't care too much about, I could just restart from that snapshot I commited and start adding my bugs to my code base again, very fast.

If it was a k3d native command it would be perfect but docker is fine for now

Describe alternatives you've considered

I tried and succeeded in creating the snapshot from a working k3d cluster with

docker commit -m "snapshot" "$(docker ps --filter name=k3d-k3s-local-server -q)" rancher/k3s:v0.10.0-snapshot

After that I run

k3d delete -a

and

docker run 53cb9ed4ec58

but I fail to restore my cluster to the initial state.

I can create a PR for this later but I am in need of some guidance on what needs to be done for this kind of approach to succeed.

In the beginning this docker commit and docker run approach would already be very useful if it worked.

The current error I see when starting a single server cluster with no agents is

Failed to get the info of the filesystem with mountpoint "/var/lib/rancher/k3s/agent/containerd/io.containerd.snapshotter.v1.overlayfs": unable to find data in memory cache.

So I am missing some mount point, I am just not sure what I need to manually recreate related to this k3s-io/k3s#495, I guess k3d delete is removing this mount

@cfontes cfontes added the enhancement New feature or request label Jan 2, 2020
@iwilltry42 iwilltry42 added the help wanted Extra attention is needed label Jan 3, 2020
@iwilltry42 iwilltry42 changed the title Create feature to create cluster snapshots and restore it [Feature] Create/Restore Cluster Snapshots Jan 3, 2020
@iwilltry42 iwilltry42 added this to the v2.0 milestone Jan 3, 2020
@iwilltry42
Copy link
Member

Hi there, thanks for opening this issue.
This is surely an interesting feature to have 👍
I'm not sure, how to proceed to get this working to be honest.
The mountpoint that you're missing there is a subdirectory of one of the volumes created within the k3s Dockerfile (see https://github.com/rancher/k3s/blob/master/package/Dockerfile).

I'd be happy to review any pull-request from your side and will have another look into this issue once I have some more time 👍

@cfontes
Copy link
Author

cfontes commented Jan 13, 2020

Ok I will do my best, let's see what happens.

@iwilltry42 iwilltry42 removed this from the v2.0 milestone Apr 21, 2020
@arkocal
Copy link

arkocal commented Jan 29, 2021

Is there any progress on this, or something similar? I would also be interested in the functionality. If not, I would be interested in giving it a try as well, though I could not get to it in the next 2-3 weeks.

@cfontes
Copy link
Author

cfontes commented Jan 29, 2021

Pĺease do, I have too much on my plate right now ( 2 jobs since Dec/2020) , so I unfortunatly I couldn't do anything.

@iwilltry42 iwilltry42 added this to the v4.3.0 milestone Feb 5, 2021
@iwilltry42 iwilltry42 modified the milestones: v4.4.5, v5.0.0 Jun 11, 2021
@iwilltry42
Copy link
Member

I just had a few more thoughts on this and now here are some things to note:

  • for simple single-server clusters (at least without agents), it's enough to do

    docker volume create k3d-test
    k3d cluster create k3d-test -v k3d-test:/var/lib/rancher/k3s
    # ... do something with the cluster ...
    k3d cluster delete k3d-test
    k3d cluster create k3d-test -v k3d-test:/var/lib/rancher/k3s

    to have the same state as before.
    This also works with docker cp'ing the contents of that directory and then copying it into place or bind-mounting the directory.

    • Problem: if you change the cluster name when running the new cluster, it will show the containers as running, but they're assigned to the original node name and the original node will also show up in kubectl get nodes, making the pods inaccessible i.e. via kubectl exec. All pods then have to be re-created (e.g. kubectl delete pods -A --all).
  • in a multi-server cluster, one has to have exactly the same IP Range again for the new nodes as one had for the old nodes, as etcd internally uses the node IPs as identifiers, otherwise the new cluster created with the backed up files, will break

@iwilltry42 iwilltry42 modified the milestones: v5.0.0, v5.1.0 Jul 21, 2021
@cfontes
Copy link
Author

cfontes commented Jul 21, 2021

I will give it a try!, that would be enough for me since k3s is our local env and only has one node.

@iwilltry42 iwilltry42 modified the milestones: v5.1.0, v5.2.0 Nov 5, 2021
@iwilltry42 iwilltry42 modified the milestones: v5.3.0, Backlog Dec 20, 2021
@iwilltry42
Copy link
Member

@cfontes , did you have any success so far?
I moved this to the backlog now instead of just moving it from milestone to milestone.. 🤔

@masantiago
Copy link

masantiago commented Oct 29, 2023

@iwilltry42 I executed your proposal of single cluster, but when creating the cluster again, k3d complains as follows :

WARN[0002] warning: encountered fatal log from node k3d-kassio-server-0 (retrying 0/10): �time="2023-10-29T18:40:47Z" level=fatal msg="starting kubernetes: preparing server: bootstrap data already found and encrypted with different token"

at least in version:

k3d version v5.6.0
k3s version v1.27.4-k3s1 (default)

On the other hand, I am trying by simply doing a snapshot of the server container and using it as the image for creating the new cluster (--image option). However, it seems to ignore what's inside and boots an empty k3s cluster.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
difficulty/high enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

4 participants