Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework pruning to report reclaimed space #8831

Conversation

bblenard
Copy link

This change adds code to report the reclaimed space after a prune.
Reclaimed space from volumes, images, and containers is recorded
during the prune call in a PruneReport struct. These structs are
collected into a slice during a system prune and processed afterwards
to calculate the total reclaimed space.

Closes #8658

@openshift-ci-robot openshift-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Dec 24, 2020
@bblenard bblenard force-pushed the issue-8658-system-prune-reclaimed-space branch from edf065b to 0c1971c Compare December 24, 2020 22:11
@zhangguanzhang
Copy link
Collaborator

zhangguanzhang commented Dec 25, 2020

Thansk for contribute
need rebase to resolve the Conflicting files at first

@bblenard bblenard force-pushed the issue-8658-system-prune-reclaimed-space branch from 0c1971c to 7b647bc Compare December 26, 2020 18:19
@openshift-ci-robot openshift-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 26, 2020
@bblenard
Copy link
Author

NOTE: Still working out tests on this PR. Running some locally on my machine had some failures but I'm not sure if those were because of how I was running them 🤷

@bblenard bblenard force-pushed the issue-8658-system-prune-reclaimed-space branch 10 times, most recently from dc3d5cd to 4ee8fbe Compare December 26, 2020 23:11
@bblenard
Copy link
Author

I think I finished this PR and I am mostly just waiting to merge in #8809 (which is why there are some extra changes currently in this PR. I should have cherry picked the commit from pull 8809 but I didn't notice until it was too late). Also if anyone has any ideas on smart ways to test the ReclaimedSize field in the SystemPruneReport type let me know. I wasn't sure of a clean way to do it.

@rhatdan
Copy link
Member

rhatdan commented Dec 30, 2020

/approve
PR was merged so can you take this out of draft state.

@openshift-ci-robot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bblenard, rhatdan

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 30, 2020
This change adds code to report the reclaimed space after a prune.
Reclaimed space from volumes, images, and containers is recorded
during the prune call in a PruneReport struct. These structs are
collected into a slice during a system prune and processed afterwards
to calculate the total reclaimed space.

Closes containers#8658

Signed-off-by: Baron Lenardson <[email protected]>
@bblenard bblenard force-pushed the issue-8658-system-prune-reclaimed-space branch from 4ee8fbe to b90f7f9 Compare December 31, 2020 02:00
@bblenard bblenard marked this pull request as ready for review December 31, 2020 02:01
@openshift-ci-robot openshift-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 31, 2020
@bblenard
Copy link
Author

bblenard commented Dec 31, 2020

Assuming all of the test pass again after the rebase this PR should be ready for some review. I am concerned about the following two aspects of this PR

  1. What is the best way to add unit tests for this code, specifically the size reporting
  2. Am I breaking any contracts with the types of data returned. Specifically in places that maybe we should be returning docker api types. I'm not sure how much we/podman wants to mirror docker.

Also here is some example output from some manual testing:

test

@rhatdan
Copy link
Member

rhatdan commented Dec 31, 2020

@baude @mheon @jwhonce PTAL

@TomSweeneyRedHat
Copy link
Member

LGTM
but definitely want an OK from @mheon

@mheon
Copy link
Member

mheon commented Jan 5, 2021

LGTM, though I'm concerned this is going to absolutely murder our performance on pruning - size operations are extremely costly.

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Jan 5, 2021
@openshift-merge-robot openshift-merge-robot merged commit b84b7c8 into containers:master Jan 5, 2021
@jwhonce
Copy link
Member

jwhonce commented Jan 5, 2021

#8891 created to track restoring payloads to meet compatibility requirements

@2xB
Copy link

2xB commented Apr 25, 2021

@bblenard Just got a message of Total reclaimed space: 273.2GB, while my file system only shows 45.0 GB less (and it never used that much for container storage). Might this somehow count layers multiple times? (Or is this intended behaviour?)

PS: If that is useful, I can also create an issue for this!

@bblenard
Copy link
Author

@bblenard Just got a message of Total reclaimed space: 273.2GB, while my file system only shows 45.0 GB less (and it never used that much for container storage). Might this somehow count layers multiple times? (Or is this intended behaviour?)

PS: If that is useful, I can also create an issue for this

That is not expected behavior. I'll review this change again to see if I can figure out why that happened. If you are able to reproduce it let me know :)

@rhatdan
Copy link
Member

rhatdan commented Apr 26, 2021

Could something like reflinks help explain it?

@bblenard
Copy link
Author

At the moment my hunch is something to do with this pattern where maybe things are getting double counted...

That being said I don't know exactly where to start to track this down. If @2xB has more information / is able to reproduce that might help.

@2xB
Copy link

2xB commented May 5, 2021

@bblenard I could produce a reproducible inconsistency, although I have no idea whether it comes from the same reason.

It is packed into a single bash script, using a podman-in-podman setup to make this reproducible and relatively clean without interfering with any pre-existing containers when executing podman system prune. Just for the record, I'm using rootless Podman 3.1.0 on Manjaro Linux and inside the Fedora container is a Podman 3.1.2 during my tests. But theoretically this script should work on any system without interference with existing podman containers.

The script sets up a podman image issue8831_2 with some podman image files inside that are deleted using podman system prune -f and podman system prune -f --all, first in that order (marks 1 and 2) and then - rolling back to issue8831_2 - in the other order (marks 3 and 4).

The script (click to expand)
#!/bin/bash

# Build podman container to work in
podman build -t issue8831_1 - << EOF
FROM fedora:34

RUN dnf update -y && dnf install -y podman nano

RUN sed -i -e 's,driver = "overlay",driver = "vfs",g' /etc/containers/storage.conf && \
  rm -rf /var/lib/containers

RUN podman pull fedora:34
EOF

# Create podman files in that container, commit as issue8831_2
podman run -it --security-opt seccomp=unconfined --privileged --name issue8831_1c issue8831_1 <<EOF
cat << EOD > Dockerfile
FROM fedora:34 as target1

RUN fallocate -l 0.5GB largefile1

RUN touch smallfile1

FROM fedora:34 as target2

RUN fallocate -l 0.5GB largefile2

RUN touch smallfile2

FROM target1 as result1

COPY --from=target2 smallfile2 .

FROM fedora:34 as result2

COPY --from=target2 smallfile2 .
COPY --from=target1 smallfile1 .
EOD

podman build --target result1 -t r1 .
podman build --target result2 -t r2 .
exit
EOF
podman commit issue8831_1c issue8831_2

# Delete podman files, way 1
podman run -it --security-opt seccomp=unconfined --privileged --rm issue8831_2 <<EOF
echo "Mark 1"
podman system prune -f
echo "Mark 2"
podman system prune -f --all
exit
EOF

# Delete podman files, way 2
podman run -it --security-opt seccomp=unconfined --privileged --rm issue8831_2 <<EOF
echo "Mark 3"
podman system prune -f --all
echo "Mark 4"
podman system prune -f
exit
EOF

# Cleanup
podman rm issue8831_1c
podman rmi issue8831_2
podman rmi issue8831_1

The result - cleaned a bit:

Mark 1
[root@d4d6a28e17f0 /]# podman system prune -f
552c1a027388a2dbc95c6eaccee2e9d95ca0b3c0f17bb61af8950fec38b0c2cf
Deleted Images
0e1e85c3af4edf74c5f4731991789561a2bfabec9e0ee2e797ae33742f9f095f
Total reclaimed space: 686.5MB

Mark 2
[root@d4d6a28e17f0 /]# podman system prune -f --all
Deleted Images
registry.fedoraproject.org/fedora:34
e32001e126b4de19d3436c0358124c9604a8f13f556e944137706f96bffeb15c
07877da981cceb7bd0064af6563c800c6e91c166265671870026ccd16c663879
localhost/r1:latest
abccf02e337b261341f0fb84d7ebd7273c2b14946e7e917c4ed365fa0516e64d
localhost/r2:latest
Total reclaimed space: 2.619GB

# ---

Mark 3
[root@e45d5fc83e68 /]# podman system prune -f --all
Deleted Images
registry.fedoraproject.org/fedora:34
e32001e126b4de19d3436c0358124c9604a8f13f556e944137706f96bffeb15c
07877da981cceb7bd0064af6563c800c6e91c166265671870026ccd16c663879
552c1a027388a2dbc95c6eaccee2e9d95ca0b3c0f17bb61af8950fec38b0c2cf
0e1e85c3af4edf74c5f4731991789561a2bfabec9e0ee2e797ae33742f9f095f
localhost/r1:latest
abccf02e337b261341f0fb84d7ebd7273c2b14946e7e917c4ed365fa0516e64d
localhost/r2:latest
Total reclaimed space: 3.992GB

Mark 4
[root@e45d5fc83e68 /]# podman system prune -f
Deleted Images
Total reclaimed space: 0B

It is clear: 686.5MB + 2.619GB != 3.992GB (+ 0B )

@mheon
Copy link
Member

mheon commented May 5, 2021 via email

@bblenard
Copy link
Author

bblenard commented May 5, 2021

@2xB Thanks for the POC. I did notice something interesting while playing with it.

Although we expect bin/podman system prune -f && bin/podman system prune -f --all to report the same reclaimed space amount as bin/podman system prune -f --all what I think we are actually seeing is the first image that is removed from the first prune is double counted... for some reason. I'll have to brush up on how oci container images are organized on disk to figure out if that makes sense / how to fix that

686.5MB + 2.619GB != 3.992GB (+ 0B )
(2*686.5MB) + 2.619GB == 3.992GB (+ 0B )

@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 23, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 23, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

podman system prune should report the reclaimed space
9 participants