Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add optional checkpoint/restore statistics #12257

Merged

Conversation

adrianreber
Copy link
Collaborator

In a recent issue there was the question about how much time it requires to create a checkpoint or restore from a checkpoint.

I added support to checkpointctl to display the statistics CRIU collects during checkpointing, but that is only about CRIU and does include information about the time Podman or the runtime needs to create the checkpoint. It is also not possible to easily get the statistics of a restore, because the restore statistics are not stored in the exported checkpoint archive.

This PR adds the parameter --print-stats to podman container checkpoint and podman container restore which results in an output like this:

# podman container checkpoint -a --print-stats
{
    "podman_checkpoint_duration": 1393998,
    "container_statistics": [
        {
            "Id": "47b02e1d474b5d5fe917825e91ac653efa757c91e5a81a368d771a78f6b5ed20",
            "runtime_checkpoint_duration": 54713,
            "criu_statistics": {
                "freezing_time": 300,
                "frozen_time": 39340,
                "memdump_time": 609,
                "memwrite_time": 84,
                "pages_scanned": 397,
                "pages_written": 14
            }
        },
        {
            "Id": "804f09a366ec12df73a7905c89bbab2808b0e305a5b6a7ffa1a9505868a40813",
            "runtime_checkpoint_duration": 720370,
            "criu_statistics": {
                "freezing_time": 103484,
                "frozen_time": 604620,
                "memdump_time": 332468,
                "memwrite_time": 252481,
                "pages_scanned": 538174,
                "pages_written": 98300
            }
        }
    ]
}
# podman container restore -a --print-stats
{
    "podman_restore_duration": 610540,
    "container_statistics": [
        {
            "Id": "47b02e1d474b5d5fe917825e91ac653efa757c91e5a81a368d771a78f6b5ed20",
            "runtime_restore_duration": 78779,
            "criu_statistics": {
                "forking_time": 5,
                "restore_time": 48624,
                "pages_restored": 14
            }
        },
        {
            "Id": "804f09a366ec12df73a7905c89bbab2808b0e305a5b6a7ffa1a9505868a40813",
            "runtime_restore_duration": 235895,
            "criu_statistics": {
                "forking_time": 263,
                "restore_time": 207423,
                "pages_restored": 98300
            }
        }
    ]
}

All times are in microseconds and everything below criu_statistics is directly taken from CRIU. The naming tries to follow CRIU as close as possible.

@mheon
Copy link
Member

mheon commented Nov 10, 2021

Looks like you have build errors?

@adrianreber adrianreber force-pushed the 2021-11-10-print-stats branch 2 times, most recently from 568aa30 to 6e4f98f Compare November 11, 2021 12:50
@rhatdan
Copy link
Member

rhatdan commented Nov 11, 2021

LGTM
@TomSweeneyRedHat @mheon PTAL

@adrianreber adrianreber force-pushed the 2021-11-10-print-stats branch from 6e4f98f to 692f4e6 Compare November 11, 2021 16:07
libpod/container_api.go Show resolved Hide resolved
libpod/container_internal.go Outdated Show resolved Hide resolved
libpod/oci.go Show resolved Hide resolved
pkg/api/handlers/libpod/containers.go Show resolved Hide resolved
@mheon
Copy link
Member

mheon commented Nov 11, 2021

Apparently Github batched and did not submit most of my comments, sorry

This adds the parameter '--print-stats' to 'podman container checkpoint'.
With '--print-stats' Podman will measure how long Podman itself, the OCI
runtime and CRIU requires to create a checkpoint and print out these
information. CRIU already creates checkpointing statistics which are
just read in addition to the added measurements. In contrast to just
printing out the ID of the checkpointed container, Podman will now print
out JSON:

 # podman container checkpoint --latest --print-stats
 {
     "podman_checkpoint_duration": 360749,
     "container_statistics": [
         {
             "Id": "25244244bf2efbef30fb6857ddea8cb2e5489f07eb6659e20dda117f0c466808",
             "runtime_checkpoint_duration": 177222,
             "criu_statistics": {
                 "freezing_time": 100657,
                 "frozen_time": 60700,
                 "memdump_time": 8162,
                 "memwrite_time": 4224,
                 "pages_scanned": 20561,
                 "pages_written": 2129
             }
         }
     ]
 }

The output contains 'podman_checkpoint_duration' which contains the
number of microseconds Podman required to create the checkpoint. The
output also includes 'runtime_checkpoint_duration' which is the time
the runtime needed to checkpoint that specific container. Each container
also includes 'criu_statistics' which displays the timing information
collected by CRIU.

Signed-off-by: Adrian Reber <[email protected]>
This adds the parameter '--print-stats' to 'podman container restore'.
With '--print-stats' Podman will measure how long Podman itself, the OCI
runtime and CRIU requires to restore a checkpoint and print out these
information. CRIU already creates process restore statistics which are
just read in addition to the added measurements. In contrast to just
printing out the ID of the restored container, Podman will now print
out JSON:

 # podman container restore --latest --print-stats
 {
     "podman_restore_duration": 305871,
     "container_statistics": [
         {
             "Id": "47b02e1d474b5d5fe917825e91ac653efa757c91e5a81a368d771a78f6b5ed20",
             "runtime_restore_duration": 140614,
             "criu_statistics": {
                 "forking_time": 5,
                 "restore_time": 67672,
                 "pages_restored": 14
             }
         }
     ]
 }

The output contains 'podman_restore_duration' which contains the
number of microseconds Podman required to restore the checkpoint. The
output also includes 'runtime_restore_duration' which is the time
the runtime needed to restore that specific container. Each container
also includes 'criu_statistics' which displays the timing information
collected by CRIU.

Signed-off-by: Adrian Reber <[email protected]>
This commit updates the man pages for checkpoint and restore to describe
the '--print-stats' parameter.

Signed-off-by: Adrian Reber <[email protected]>
@adrianreber adrianreber force-pushed the 2021-11-10-print-stats branch from 692f4e6 to d28b39a Compare November 15, 2021 11:50
@adrianreber
Copy link
Collaborator Author

I think I addressed all review comments.

@mheon
Copy link
Member

mheon commented Nov 15, 2021

LGTM

@rhatdan
Copy link
Member

rhatdan commented Nov 15, 2021

/approve
/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Nov 15, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 15, 2021

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: adrianreber, rhatdan

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 15, 2021
@openshift-merge-robot openshift-merge-robot merged commit d40736f into containers:main Nov 15, 2021
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 22, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 22, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants