Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: Prevent OCI runtime directory remain #14720

Merged
merged 1 commit into from
Jun 29, 2022

Conversation

sstosh
Copy link
Contributor

@sstosh sstosh commented Jun 24, 2022

This bug was introduced in #8906.

When we use podman rm/restart/stop/kill etc... command to
the container running with --rm, the OCI runtime directory
remains at /run/<runtime name> (root user) or
/run/user/<user id>/<runtime name> (rootless user).

This bug could cause other bugs.
For example, when we checkpoint the container running with
--rm (podman checkpoint --export) and restore it
(podman restore --import) with crun, error message
Error: OCI runtime error: crun: container '<container id>' already exists is outputted.
This error is caused by an attempt to restore the container with
the same container ID as the remaining OCI runtime's container ID.

Therefore, I fix that the cleanupRuntime() function runs to
remove the OCI runtime directory,
even if the container has already been removed by --rm option.

Signed-off-by: Toshiki Sonoda [email protected]

Does this PR introduce a user-facing change?

This fix prevents the OCI runtime directory from remaining. It also prevents other bugs occurring.

This bug was introduced in containers#8906.

When we use 'podman rm/restart/stop/kill etc...' command to
the container running with --rm, the OCI runtime directory
remains at /run/<runtime name> (root user) or
/run/user/<user id>/<runtime name> (rootless user).

This bug could cause other bugs.
For example, when we checkpoint the container running with
--rm (podman checkpoint --export) and restore it
(podman restore --import) with crun, error message
"Error: OCI runtime error: crun: container `<container id>`
already exists" is outputted.
This error is caused by an attempt to restore the container with
the same container ID as the remaining OCI runtime's container ID.

Therefore, I fix that the cleanupRuntime() function runs to
remove the OCI runtime directory,
even if the container has already been removed by --rm option.

Signed-off-by: Toshiki Sonoda <[email protected]>
@mheon
Copy link
Member

mheon commented Jun 24, 2022

I would expect that the cleanup process which removes containers started with --rm would handle removal of directories associated with the OCI runtime; removing a container always calls cleanup() which always calls cleanupRuntime(). Can you elaborate more as to what you're seeing?

@sstosh
Copy link
Contributor Author

sstosh commented Jun 27, 2022

I would expect that the cleanup process which removes containers started with --rm would handle removal of directories associated with the OCI runtime; removing a container always calls cleanup() which always calls cleanupRuntime(). Can you elaborate more as to what you're seeing?

@mheon
Because cleanupRuntime() does nothing when cleanup() is called for ContainerStateStopping containers.
As a result, OCI runtime directory dose not removed.

I think there are two plans to remove the OCI runtime directory with cleanupRuntime() call.

  1. Ensure cleanupRuntime() to call even if the container state is ContainerStateStopping (commit: 81adde3).
  2. Change the container state from ContainerStateStopping to ContainerStateStopped and call cleanupRuntime() again (commit: 3619f0b).

The 1st plan needs to change the container state from ContainerStateStopping when cleanupRuntime() called.
This causes the warning message output from https://github.com/containers/podman/blob/main/libpod/container_internal.go#L1336
when we remove the not --rm container.

The 2nd plan needs to change the container state to ContainerStateStoped before cleanupRuntime() called.
This plan does not output the warning message.

If we choose the 1st plan, we need to deal with the warning message.
Also test scripts need fix because exit code changes.

I think the 2nd plan is better because it does not output warnings
and does not need to fix test scripts more than the 1st plan.

@mheon
Copy link
Member

mheon commented Jun 27, 2022

The logic behind cleanup is correct; we absolutely should not attempt to clean up Stopping containers, as they are still running. The problem is that, somehow, the state transition from Stopping to Stopped is not detected (from what you've said, seemingly only in the --rm case - so most likely a part of the RemoveContainer()) function). The question would be how a container in the Stopping state got through this to the cleanupContainer() bits below. I can see two possibilities:
1, the stop() function for the container is returning early (it should pick up the state transition automatically if it runs to the end of the function).
2. The container is already in Stopping state when RemoveContainer() is called. The code block that calls stop() never occurs because of this.

Of these, 2 presently seems more likely. It seems that https://github.com/containers/podman/blob/main/libpod/runtime_ctr.go#L701 should also trigger for ContainerStateStopping (possibly with an extra check to see if the container is already stopped before calling c.stop()).

giuseppe
giuseppe previously approved these changes Jun 28, 2022
Copy link
Member

@giuseppe giuseppe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 28, 2022
@giuseppe giuseppe dismissed their stale review June 28, 2022 09:56

saw the notes from Matt

@openshift-ci openshift-ci bot removed the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 28, 2022
@rhatdan
Copy link
Member

rhatdan commented Jun 29, 2022

/approve
/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jun 29, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 29, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: rhatdan, sstosh

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 29, 2022
@openshift-ci openshift-ci bot merged commit 2cc3f12 into containers:main Jun 29, 2022
@sstosh sstosh deleted the rm-option branch July 25, 2022 10:33
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 21, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 21, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. release-note
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants