Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libpod: stop containers with --restart=always #18267

Merged
merged 1 commit into from
Apr 20, 2023

Conversation

Luap99
Copy link
Member

@Luap99 Luap99 commented Apr 19, 2023

Commit 1ab833f improved the situation but it is still not enough. If you run short lived containers with --restart=always podman is basically permanently restarting them. To only way to stop this is podman stop. However podman stop does not do anything when the container is already in a not running state. While this makes sense we should still mark the container as explicitly stopped by the user.

Together with the change in shouldRestart() which now checks for StoppedByUser this makes sure the cleanup process is not going to start it back up again.

A simple reproducer is:

podman run --restart=always --name test -d alpine true
podman stop test

then check if the container is still running, the behavior is very flaky, it took me like 20 podman stop tries before I finally hit the correct window were it was stopped permanently.
With this patch it worked on the first try.

Fixes #18259

[NO NEW TESTS NEEDED] This is super flaky and hard to correctly test in CI. MY ginkgo v2 work seems to trigger this in play kube tests so that should catch at least some regressions. Also this may be something that should be tested at podman test days by users (#17912).

Does this PR introduce a user-facing change?

podman stop will now correctly stop containers created with --restart=always in all cases.

@openshift-ci openshift-ci bot added release-note approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Apr 19, 2023
Copy link
Member

@edsantiago edsantiago left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

language, language

libpod/container_internal.go Outdated Show resolved Hide resolved
libpod/container_internal.go Outdated Show resolved Hide resolved
libpod/container_internal.go Outdated Show resolved Hide resolved
@Luap99
Copy link
Member Author

Luap99 commented Apr 19, 2023

@vrothberg @mheon PTAL

Copy link
Member

@vrothberg vrothberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

libpod/container_internal.go Outdated Show resolved Hide resolved
@edsantiago
Copy link
Member

LGTM. I'd really, really like to see this behavior unambiguously documented in podman-stop(1) but that can be a followup.

@edsantiago
Copy link
Member

Also:

Restart policy will not take effect if a container is stopped via the **podman kill** or **podman stop** commands.

...I think a similar fix will be needed for podman kill -a.

@Luap99
Copy link
Member Author

Luap99 commented Apr 19, 2023

Kill should work with this correctly. It correctly sets StoppedByUser so I assume the fix in shouldRestart() applies to it as well.

I am happy to touch up man pages in a follow up.

@edsantiago
Copy link
Member

Kill should work with this correctly

I did try it before reporting, and just tried it again (f3d453c1fea032db312d0794dcec076754317d3e):

$ bin/podman run -d --restart=always quay.io/libpod/testimage:20221018 date
1f4811e1e807232762bd620614b70ecb2bd713ec737efda20f1430f68e2e2846
$ bin/podman kill -a
$ bin/podman ps -a
...keeps showing "Exited (0) Less than a second ago"; once in a while shows "Up Less than a second"

@Luap99
Copy link
Member Author

Luap99 commented Apr 19, 2023

Kill is not supposed to work on non running containers. You have to test with podman kill <name>, then you see it exists out with an error. --all is of course a NOP if the continuer is not running. If you try it often enough and hit the window were the container is really running then it works:

$ podman kill test
Error: can only kill running containers. 7b6a2199d8e9d1e5148c23706bec7851ba748de23170bdd0a1f514dcf9669e1e is in state stopped: container state improper
$ podman kill test
Error: can only kill running containers. 7b6a2199d8e9d1e5148c23706bec7851ba748de23170bdd0a1f514dcf9669e1e is in state stopped: container state improper
$ podman kill test
test
$ podman ps
CONTAINER ID  IMAGE       COMMAND     CREATED     STATUS      PORTS       NAMES

@mheon
Copy link
Member

mheon commented Apr 19, 2023 via email

@Luap99
Copy link
Member Author

Luap99 commented Apr 19, 2023

This seems a little iffy - it can change a container that is already in Exited state to Stopping from my read, which does not seem correct. Is just setting StoppedByUser not sufficient?

Stopping is not set when the sate is not correct see the if cannotStopErr == nil { block, so yes I am only setting StoppedByUser. The reason I choose to do it that way is to only have one c.save() because we care about the performance of these calls AFAIK.

// commands such as `podman ps` from progressing if the container lock
// is held when busy-waiting for the container to be stopped.
c.state.State = define.ContainerStateStopping
}
if err := c.save(); err != nil {
return fmt.Errorf("saving container %s state before stopping: %w", c.ID(), err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe log this instead if cannotStopErr is not nil, so we don't lose the real error?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@mheon
Copy link
Member

mheon commented Apr 19, 2023

One suggestion, otherwise LGTM

Commit 1ab833f improved the situation but it is still not enough.
If you run short lived containers with --restart=always podman is
basically permanently restarting them. To only way to stop this is
podman stop. However podman stop does not do anything when the
container is already in a not running state. While this makes sense we
should still mark the container as explicitly stopped by the user.

Together with the change in shouldRestart() which now checks for
StoppedByUser this makes sure the cleanup process is not going to start
it back up again.

A simple reproducer is:
```
podman run --restart=always --name test -d alpine true
podman stop test
```
then check if the container is still running, the behavior is very
flaky, it took me like 20 podman stop tries before I finally hit the
correct window were it was stopped permanently.
With this patch it worked on the first try.

Fixes containers#18259

[NO NEW TESTS NEEDED] This is super flaky and hard to correctly test
in CI. MY ginkgo v2 work seems to trigger this in play kube tests so
that should catch at least some regressions. Also this may be something
that should be tested at podman test days by users (containers#17912).

Signed-off-by: Paul Holzinger <[email protected]>
@Luap99
Copy link
Member Author

Luap99 commented Apr 20, 2023

tests are green, please merge

Copy link
Member

@vrothberg vrothberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Apr 20, 2023
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Apr 20, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Luap99, vrothberg

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-robot openshift-merge-robot merged commit f570201 into containers:main Apr 20, 2023
@Luap99 Luap99 deleted the always-stop branch April 20, 2023 11:21
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Aug 27, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 27, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. release-note
Projects
None yet
Development

Successfully merging this pull request may close these issues.

restart=always not properly working with podman stop
5 participants