Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for checkpoint image #13505

Merged
merged 3 commits into from
Apr 21, 2022

Conversation

rst0git
Copy link
Contributor

@rst0git rst0git commented Mar 14, 2022

This is an enhancement proposal for the checkpoint / restore feature of Podman that enables container migration across multiple systems with standard image distribution infrastructure.

A new option --create-image <image> has been added to the podman container checkpoint command. This option tells Podman to create a container image. This is a standard image with a single layer, tar archive, that that contains all checkpoint files. This is similar to the current approach for container migration using --export / --import.

The checkpoint image can be pushed to a container registry and pulled on a different system. It can also be exported locally with podman image save and inspected with podman inspect. Inspecting the image would display additional information (stored as annotations) about the host and the versions of Podman, criu, crun/runc, kernel, etc.

podman container restore has also been extended to support checkpoint image provided as input.

Simple example:

podman run -d --name looper busybox /bin/sh -c 'i=0; while true; do echo $i; i=$(expr $i + 1); sleep 1; done'
podman container checkpoint --create-image checkpoint-image-test looper
podman rm looper
podman container restore checkpoint-image-test

Example with multiple containers:

podman run -d --name looper-1 busybox /bin/sh -c 'i=0; while true; do echo $i; i=$(expr $i + 1); sleep 1; done'
podman run -d --name looper-2 busybox /bin/sh -c 'i=0; while true; do echo $i; i=$(expr $i + 1); sleep 1; done'
podman container checkpoint --create-image checkpoint-1 looper-1
podman container checkpoint --create-image checkpoint-2 looper-2
podman rm looper-1 looper-2
podman container restore checkpoint-1 checkpoint-2

Example using container registry:

podman run -d --name looper busybox /bin/sh -c 'i=0; while true; do echo $i; i=$(expr $i + 1); sleep 1; done'
podman container checkpoint --create-image quay.io/rst0git/checkpoint-image-test-1 looper
podman push quay.io/rst0git/checkpoint-image-test-1
podman rm -a
podman rmi -a
podman pull quay.io/rst0git/checkpoint-image-test-1
podman inspect quay.io/rst0git/checkpoint-image-test-1
podman container restore quay.io/rst0git/checkpoint-image-test-1

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 14, 2022
@vrothberg
Copy link
Member

@adrianreber PTAL

@adrianreber
Copy link
Collaborator

Overall I like this approach a lot. This is also very helpful when looking at our upcoming CRI-O changes. With something like this we might get container restore in Kubernetes without Kubernetes knowing that it is a restore. At some point, not now, this can be enhanced to have podman run registry/image:latest automatically do a restore if the image is a checkpoint image.

@rst0git Do you also store the name of the container engine (in this case Podman) in the image metadata? Just to make sure we have something in the image to quickly see if it is an image that we can also restore in CRI-O. I have already restored Podman checkpoints in CRI-O using the existing checkpoint archive. My assumption is that it should also be possible with this image format.

libpod/container_internal_linux.go Outdated Show resolved Hide resolved
libpod/container_internal_linux.go Outdated Show resolved Hide resolved
libpod/container_internal_linux.go Outdated Show resolved Hide resolved
libpod/container_internal_linux.go Outdated Show resolved Hide resolved
libpod/container_internal_linux.go Outdated Show resolved Hide resolved
utils/utils.go Outdated Show resolved Hide resolved
utils/utils.go Outdated Show resolved Hide resolved
utils/utils.go Outdated Show resolved Hide resolved
cmd/podman/containers/restore.go Show resolved Hide resolved
pkg/checkpoint/checkpoint_restore.go Show resolved Hide resolved
@rst0git
Copy link
Contributor Author

rst0git commented Mar 16, 2022

Do you also store the name of the container engine (in this case Podman) in the image metadata?

@adrianreber Not explicitly, but the annotation keys contain the name of the container engine (e.g., io.podman.annotations.).

@rst0git rst0git force-pushed the checkpoint-image-1 branch 5 times, most recently from 8335bc8 to 0a5637d Compare March 18, 2022 14:52
@rst0git rst0git force-pushed the checkpoint-image-1 branch from 0a5637d to 6a8c505 Compare March 22, 2022 08:58
@rst0git rst0git marked this pull request as ready for review March 22, 2022 14:57
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 22, 2022
@rst0git rst0git requested review from vrothberg and Luap99 March 22, 2022 14:58
@rst0git rst0git force-pushed the checkpoint-image-1 branch 4 times, most recently from 8899939 to 011c6df Compare March 23, 2022 23:17
@rst0git rst0git marked this pull request as draft March 24, 2022 05:40
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 24, 2022
@rst0git rst0git force-pushed the checkpoint-image-1 branch 7 times, most recently from 33087f6 to 786dfad Compare March 26, 2022 15:31
libpod/container_internal_linux.go Outdated Show resolved Hide resolved
libpod/container_internal_linux.go Outdated Show resolved Hide resolved
libpod/container_internal_linux.go Outdated Show resolved Hide resolved
cmd/podman/containers/checkpoint.go Outdated Show resolved Hide resolved
@rst0git rst0git force-pushed the checkpoint-image-1 branch 3 times, most recently from 05607b7 to d9b57da Compare April 12, 2022 16:37
@rst0git rst0git requested review from Luap99 and vrothberg April 12, 2022 16:38
@rst0git rst0git force-pushed the checkpoint-image-1 branch 3 times, most recently from 002bc0f to bb42fcc Compare April 14, 2022 08:21
@rst0git
Copy link
Contributor Author

rst0git commented Apr 14, 2022

I would be happy to squash the commits but Build Each Commit fails when a single commit increases the size of bin/podman with more than 51200 bytes: https://cirrus-ci.com/task/6191166545723392?logs=main#L144

We can overwrite this with a label.

@Luap99 Would it be possible to add a label to overwrite this requirement?

@TomSweeneyRedHat
Copy link
Member

@Luap99 is this GTG? PTAL

rst0git added 3 commits April 20, 2022 18:52
The changes in this commit have been generated with the following
commands:

    go get github.com/checkpoint-restore/checkpointctl
    make vendor

Signed-off-by: Radostin Stoyanov <[email protected]>
This is an enhancement proposal for the checkpoint / restore feature of
Podman that enables container migration across multiple systems with
standard image distribution infrastructure.

A new option `--create-image <image>` has been added to the
`podman container checkpoint` command. This option tells Podman to
create a container image.  This is a standard image with a single layer,
tar archive, that that contains all checkpoint files. This is similar to
the current approach with checkpoint `--export`/`--import`.

This image can be pushed to a container registry and pulled on a
different system.  It can also be exported locally with `podman image
save` and inspected with `podman inspect`. Inspecting the image would
display additional information about the host and the versions of
Podman, criu, crun/runc, kernel, etc.

`podman container restore` has also been extended to support image
name or ID as input.

Suggested-by: Adrian Reber <[email protected]>
Signed-off-by: Radostin Stoyanov <[email protected]>
The patch introduces the following test cases:

1. An attempt to checkpoint a container that does not exist should fail.
2. Checkpoint of a running container with --create-image should create a
   checkpoint image.
3. A single checkpoint image can be used to restore multiple containers,
   each with a different name.
4. Restoring multiple containers from checkpoint images with a single
   restore command.

Signed-off-by: Radostin Stoyanov <[email protected]>
@rst0git rst0git force-pushed the checkpoint-image-1 branch from bb42fcc to bbe1063 Compare April 20, 2022 17:55
Copy link
Member

@vrothberg vrothberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@Luap99 PTAL

@rhatdan
Copy link
Member

rhatdan commented Apr 21, 2022

/approve

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Apr 21, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: rhatdan, rst0git

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 21, 2022
@Luap99
Copy link
Member

Luap99 commented Apr 21, 2022

/lgtm

Sorry for the delay.

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Apr 21, 2022
@openshift-merge-robot openshift-merge-robot merged commit cb09c26 into containers:main Apr 21, 2022
@rst0git rst0git deleted the checkpoint-image-1 branch April 21, 2022 16:04
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 21, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 21, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants