-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
container stop: release lock before calling the runtime #8906
Conversation
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: vrothberg The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
I don't think you ran the BATS tests... |
That's not very helpful. At the moment, the tests fail locally even on master on my machine. Not sure what's going on. |
I'm sorry; I don't know how to help, because I don't understand the change. What first caught my eye was the |
@vrothberg One suggestion: I think that the container remove code may need a small change to ensure that it treats stopping containers properly, and they can still be removed - I'm concerned about |
Thanks, @edsantiago and @mheon! Stuck in meetings for the rest of the day. The test is definitely wrong as is. I made some progress locally but will continue tomorrow morning. |
It's more than the test. Look at the existing |
Yes, there's more work. Also kill behaved oddly in local tests. |
19f2892
to
edd177f
Compare
Podman defers stopping the container to the runtime, which can take some time. Keeping the lock while waiting for the runtime to complete the stop procedure, prevents other commands from acquiring the lock as shown in containers#8501. To improve the user experience, release the lock before invoking the runtime, and re-acquire the lock when the runtime is finished. Also introduce an intermediate "stopping" to properly distinguish from "stopped" containers etc. Fixes: containers#8501 Signed-off-by: Valentin Rothberg <[email protected]>
@@ -758,7 +758,7 @@ func (c *Container) isStopped() (bool, error) { | |||
return true, err | |||
} | |||
|
|||
return !c.ensureState(define.ContainerStateRunning, define.ContainerStatePaused), nil | |||
return !c.ensureState(define.ContainerStateRunning, define.ContainerStatePaused, define.ContainerStateStopping), nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to self: in future we should require all of these ensureState
invocations to not be inverted - will make them safer when we add states, as all valid states will have to be explicitly listed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or add private functions for isRunning
etc.
LGTM |
/lgtm |
/hold cancel |
After containers#8906, there is a potential race condition in container removal of running containers with `--rm`. Running containers must first be stopped, which was changed to unlock the container to allow commands like `podman ps` to continue to run while stopping; however, this also means that the cleanup process can potentially run before we re-lock, and remove the container from under us, resulting in error messages from `podman rm`. The end result is unchanged, the container is still cleanly removed, but the `podman rm` command will seem to have failed. Work around this by pinging the database after we stop the container to make sure it still exists. If it doesn't, our job is done and we can exit cleanly. Signed-off-by: Matthew Heon <[email protected]>
This bug was introduced in containers#8906. When we use 'podman rm/restart/stop/kill etc...' command to the container running with --rm, the OCI runtime directory remains at /run/<runtime name> (root user) or /run/user/<user id>/<runtime name> (rootless user). This bug could cause other bugs. For example, when we checkpoint the container running with --rm (podman checkpoint --export) and restore it (podman restore --import) with crun, error message "Error: OCI runtime error: crun: container `<container id>` already exists" is outputted. This error is caused by an attempt to restore the container with the same container ID as the remaining OCI runtime's container ID. Therefore, I fix that the cleanupRuntime() function runs to remove the OCI runtime directory, even if the container has already been removed by --rm option. Signed-off-by: Toshiki Sonoda <[email protected]>
Podman defers stopping the container to the runtime, which can take some
time. Keeping the lock while waiting for the runtime to complete the
stop procedure, prevents other commands from acquiring the lock as shown
in #8501.
To improve the user experience, release the lock before invoking the
runtime, and re-acquire the lock when the runtime is finished. Also
introduce an intermediate "stopping" to properly distinguish from
"stopped" containers etc.
Fixes: #8501
Signed-off-by: Valentin Rothberg [email protected]