Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Repeat system pruning until there is nothing removed #8599

Merged
merged 1 commit into from
Dec 9, 2020

Conversation

rhatdan
Copy link
Member

@rhatdan rhatdan commented Dec 4, 2020

Signed-off-by: Daniel J Walsh [email protected]

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 4, 2020
@rhatdan
Copy link
Member Author

rhatdan commented Dec 4, 2020

Fixes: #7990

@rhatdan
Copy link
Member Author

rhatdan commented Dec 4, 2020

@srcshelton PTAL

@srcshelton
Copy link
Contributor

Okay, let me patch this into podman-2.2.0 and I'll check it out.

I'll do system rebuild without and check that multiple runs are required, and then again with to confirm it's fixed.

This may take a good number of hours…

(Although it does look as if it can't fail to resolve the problem!)

if pruneOptions.Volume {
fmt.Println("Deleted Volumes")
err = utils.PrintVolumePruneResults(response.VolumePruneReport)
for {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recommendation: We should put a max iteration count on this - if we haven't removed everything in, say, 25 loops, we probably have a problem.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a quick note to agree, but suggest a larger number: I've seen 40-odd manual iterations before… so set the bar at 50?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'm not at all a fan of a naked for loop.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If each iteration takes a beat, we might want to add a message like "Working" that adds a dot after each iteration or after some number of iterations.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each successful interaction will print purged lines, So no need for heartbeat.

@rhatdan rhatdan force-pushed the prune branch 2 times, most recently from a14a8f6 to ff0d964 Compare December 5, 2020 11:38
@srcshelton
Copy link
Contributor

... a couple of questions:

  • The earlier patch to repeatedly remove stale images when using podman image prune was said to use the same code-path as the image-pruning stage of podman system prune - is this not actually the case?

  • Did the podman image prune patch also have a maximum number of iterations set - and, if so, do both commands now share a consistent limit?

@srcshelton
Copy link
Contributor

Also, I just did another system prune which removed a large number of images, and the output froze for quite some time with only half of the last hash to be removed written to the terminal - could output-buffering be disabled for this operation (as I assume that an output buffer was filed when half the hash had been printed, and the delay in further output was the buffer not flushing again until filled)?

(It could also be a symptom of another issue, such as the host system locking temporarily due to high I/O? I didn't see any other signs to suggest this being the case, though - other interactive tasks running at the same time weren't affected, and it was a long pause - at least 30s or so once I'd noticed it, but could have been much longer since I was doing other things and only checking the output occasionally)

@TomSweeneyRedHat
Copy link
Member

Code LGTM, a couple of small doc nits. Any chance to get a test geared towards this?

@rhatdan
Copy link
Member Author

rhatdan commented Dec 7, 2020

@srcshelton podman system prune calls the same internal function to prune images as podman image prune

As it does with podman volume prune, container prune, pod prune. So there should not be a difference. I don't think the pause was caused by the output printing, but some kind of locking in the storage layer. Is some other operation was running on your system, there is a good chance that the removal of content was blocked on a lock.

@srcshelton
Copy link
Contributor

Interesting - would a wait on a lock cause the output to pause half-way through printing an image hash, though?

(Even if it is a lock-wait, unbuffered output would at least slow full lines to be printed whilst waiting?)

Whilst there were other contains running when the pause in output occurred, these had been running for several days at this point, and there was no other podman activity: nothing was starting or stopping or creating/updating images.

My question was just trying to understand how image pruning was fixed, and system pruning uses the same internal function, and yet system pruning has needed fixing separately? And whether there's a recursion limit for image pruning and, if so, whether the two methods to prune images are using the same limit, or whether they differ (potentially causing future confusion!)?

@rhatdan
Copy link
Member Author

rhatdan commented Dec 7, 2020

All of the printing is being done with fmt.Println(). We could change this, but it does seem like a corner case.

I did not do anything for image pruning. I have just added a loop on system prune to try to prune content again, since one pass of pruning could free up other pruning.

@containers/podman-maintainers PTAL

@@ -16,7 +16,7 @@ By default, volumes are not removed to prevent important data from being deleted
## OPTIONS
#### **--all**, **-a**

Remove all unused images not just dangling ones.
Recursively remove all unused images, not just dangling ones. (Maximum 50 iterations.)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"all unused images" ... I think we remove more than just images

fmt.Println("Deleted Volumes")
err = utils.PrintVolumePruneResults(response.VolumePruneReport)

const MAX = 50
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems we're now working around the fact that registry.ContainerEngine().SystemPrune(...) isn't doing it's job correctly. Having the code scattered between cmd/podman and pkg/domain/... seems like a recipe for trouble.

Could we move all the logic to registry.ContainerEngine().SystemPrune(...)? I think it should remove all data, even if there are more than 50 iterations. It's not friendly to use. If I do a rm -rf * I don't want to ls afterwards to check if things were really removed.

Copy link
Member

@vrothberg vrothberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code LGTM

Copy link
Member

@saschagrunert saschagrunert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@openshift-ci-robot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: rhatdan, saschagrunert

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [rhatdan,saschagrunert]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@saschagrunert
Copy link
Member

Ah alright:

Inconsistent subcommand descriptions:
  podman-system-prune.1.md         = 'Remove all unused pod, container, image and volume data'
  podman-system.1.md               = 'Remove all unused container, image and volume data'
Please ensure that the NAME section of podman-system-prune.1.md
matches the subcommand description in podman-system.1.md

@rhatdan rhatdan added the lgtm Indicates that a PR is ready to be merged. label Dec 9, 2020
@openshift-merge-robot openshift-merge-robot merged commit b875c5c into containers:master Dec 9, 2020
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 24, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 24, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants