Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix podman stop and podman run --rmi #23670

Merged
merged 1 commit into from
Aug 20, 2024

Conversation

mheon
Copy link
Member

@mheon mheon commented Aug 19, 2024

This started off as an attempt to make podman stop on a container started with --rm actually remove the container, instead of just cleaning it up and waiting for the cleanup process to finish the removal.

In the process, I realized that podman run --rmi was rather broken. It was only done as part of the Podman CLI, not the cleanup process (meaning it only worked with attached containers) and the way it was wired meant that I was fairly confident that it wouldn't work if I did a podman stop on an attached container run with --rmi. I rewired it to use the same mechanism that podman run --rm uses, so it should be a lot more durable now, and I also wired it into podman inspect so you can tell that a container will remove its image.

Tests have been added for the changes to podman run --rmi. No tests for stop on a run --rm container as that would be racy.

Does this PR introduce a user-facing change?

Fixed a bug where the `podman stop` command would not ensure containers created with `--rm` were removed when it exited.
Fixed a bug where the `--rmi` option to `podman run` did not function correctly with detached containers.
The output of the `podman inspect` command for containers now includes a new field, `HostConfig.AutoRemoveImage`, which shows whether a container was created with the `--rmi` option set.

@openshift-ci openshift-ci bot added release-note approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Aug 19, 2024
@github-actions github-actions bot added the kind/api-change Change to remote API; merits scrutiny label Aug 19, 2024
@Luap99
Copy link
Member

Luap99 commented Aug 19, 2024

Since we were talking about this please add a Fixes #22852 and Fixes for the Jira issue to your commit

# Try again with a detached container and verify it works
cname=c_$(safename)
run_podman run -d --name $cname --rmi $NONLOCAL_IMAGE /bin/true
run_podman wait $cname
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will not work, see my latest wait changes. podman wait waits for stop not removal so this will flake a lot.
And you cannot even wait for removing as the image is removed after the ctr is removed so there would still be a race...

I suggest you a sleep loop to wait until image exists command fails and then have some timeout in there

// We still need to clean up since we do not know if the other cleanup process is successful
if c.AutoRemove() && (errors.Is(err, define.ErrNoSuchCtr) || errors.Is(err, define.ErrCtrRemoved)) {
return nil
if c.AutoRemove() && !c.ShouldRestart(ctx) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do not call c.ShouldRestart(ctx) here, a container that is stopped by hand cannot by definition restart again.

Comment on lines 314 to 321
if (errors.Is(err, define.ErrNoSuchCtr) || errors.Is(err, define.ErrCtrRemoved)) {
return nil
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is now wrong, we only ignore the error in case of auto remove removed the ctr in the background.

I guess there is harm harm ignoring it here though but is is a change of logic that isn't needed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think ignoring is still safer - there's an inherent race around the cleanup here as we unlock after Stop(), so a podman rm can sneak in before cleanup.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it can yes sure, but is podman stop ctr and then ctr is removed in the middle not an error to the end user? Unless the ctr is configured for auto removed I would argue this will be a user error if they stop/remove in parallel IMO and something they should notice. But I can accept either way.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's an error case since the stop actually happened, and the only thing we're erroring on is cleanup - so we did what was asked, and only the cleanup logic is throwing an error.

@mheon mheon force-pushed the fix_stop_and_rmi branch 5 times, most recently from 4fcbb55 to 4122abb Compare August 19, 2024 19:00
@TomSweeneyRedHat
Copy link
Member

@mheon mheon force-pushed the fix_stop_and_rmi branch 2 times, most recently from ba1d729 to 79a3dba Compare August 19, 2024 20:13
@mheon
Copy link
Member Author

mheon commented Aug 20, 2024

@containers/podman-maintainers This is ready

Copy link
Member

@edsantiago edsantiago left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, some suggestions for the test

test/system/030-run.bats Outdated Show resolved Hide resolved
test/system/030-run.bats Outdated Show resolved Hide resolved
test/system/030-run.bats Outdated Show resolved Hide resolved
This started off as an attempt to make `podman stop` on a
container started with `--rm` actually remove the container,
instead of just cleaning it up and waiting for the cleanup
process to finish the removal.

In the process, I realized that `podman run --rmi` was rather
broken. It was only done as part of the Podman CLI, not the
cleanup process (meaning it only worked with attached containers)
and the way it was wired meant that I was fairly confident that
it wouldn't work if I did a `podman stop` on an attached
container run with `--rmi`. I rewired it to use the same
mechanism that `podman run --rm` uses, so it should be a lot more
durable now, and I also wired it into `podman inspect` so you can
tell that a container will remove its image.

Tests have been added for the changes to `podman run --rmi`. No
tests for `stop` on a `run --rm` container as that would be racy.

Fixes containers#22852
Fixes RHEL-39513

Signed-off-by: Matt Heon <[email protected]>
@mheon mheon force-pushed the fix_stop_and_rmi branch from 79a3dba to 458ba5a Compare August 20, 2024 13:51
Copy link
Member

@edsantiago edsantiago left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests LGTM

Copy link
Contributor

openshift-ci bot commented Aug 20, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: edsantiago, mheon

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

return fmt.Errorf("removing container %s image %s: %w", c.ID(), imageName, mErr)
}
}
} else {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like if you specify AutoRemoteImage, we are no longer cleaning up the containers?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing the image implies removing the container, and removing the container also unmounts, cleans up the network, etc, so we should be OK.

@rhatdan
Copy link
Member

rhatdan commented Aug 20, 2024

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Aug 20, 2024
@openshift-merge-bot openshift-merge-bot bot merged commit 43fe3eb into containers:main Aug 20, 2024
85 of 86 checks passed
@stale-locking-app stale-locking-app bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Nov 19, 2024
@stale-locking-app stale-locking-app bot locked as resolved and limited conversation to collaborators Nov 19, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. kind/api-change Change to remote API; merits scrutiny lgtm Indicates that a PR is ready to be merged. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. release-note
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants