libpod: stop containers with --restart=always #18267

Luap99 · 2023-04-19T10:15:00Z

Commit 1ab833f improved the situation but it is still not enough. If you run short lived containers with --restart=always podman is basically permanently restarting them. To only way to stop this is podman stop. However podman stop does not do anything when the container is already in a not running state. While this makes sense we should still mark the container as explicitly stopped by the user.

Together with the change in shouldRestart() which now checks for StoppedByUser this makes sure the cleanup process is not going to start it back up again.

A simple reproducer is:

podman run --restart=always --name test -d alpine true
podman stop test

then check if the container is still running, the behavior is very flaky, it took me like 20 podman stop tries before I finally hit the correct window were it was stopped permanently.
With this patch it worked on the first try.

Fixes #18259

[NO NEW TESTS NEEDED] This is super flaky and hard to correctly test in CI. MY ginkgo v2 work seems to trigger this in play kube tests so that should catch at least some regressions. Also this may be something that should be tested at podman test days by users (#17912).

Does this PR introduce a user-facing change?

podman stop will now correctly stop containers created with --restart=always in all cases.

edsantiago

language, language

libpod/container_internal.go

Luap99 · 2023-04-19T11:26:50Z

@vrothberg @mheon PTAL

vrothberg

LGTM

libpod/container_internal.go

edsantiago · 2023-04-19T14:49:13Z

LGTM. I'd really, really like to see this behavior unambiguously documented in podman-stop(1) but that can be a followup.

edsantiago · 2023-04-19T14:58:31Z

Also:

podman/docs/source/markdown/options/restart.md

Line 8 in 53d8e4b

    
           Restart policy will not take effect if a container is stopped via the **podman kill** or **podman stop** commands.

...I think a similar fix will be needed for podman kill -a.

Luap99 · 2023-04-19T15:54:05Z

Kill should work with this correctly. It correctly sets StoppedByUser so I assume the fix in shouldRestart() applies to it as well.

I am happy to touch up man pages in a follow up.

edsantiago · 2023-04-19T15:58:19Z

Kill should work with this correctly

I did try it before reporting, and just tried it again (f3d453c1fea032db312d0794dcec076754317d3e):

$ bin/podman run -d --restart=always quay.io/libpod/testimage:20221018 date
1f4811e1e807232762bd620614b70ecb2bd713ec737efda20f1430f68e2e2846
$ bin/podman kill -a
$ bin/podman ps -a
...keeps showing "Exited (0) Less than a second ago"; once in a while shows "Up Less than a second"

Luap99 · 2023-04-19T16:01:44Z

Kill is not supposed to work on non running containers. You have to test with podman kill <name>, then you see it exists out with an error. --all is of course a NOP if the continuer is not running. If you try it often enough and hit the window were the container is really running then it works:

$ podman kill test
Error: can only kill running containers. 7b6a2199d8e9d1e5148c23706bec7851ba748de23170bdd0a1f514dcf9669e1e is in state stopped: container state improper
$ podman kill test
Error: can only kill running containers. 7b6a2199d8e9d1e5148c23706bec7851ba748de23170bdd0a1f514dcf9669e1e is in state stopped: container state improper
$ podman kill test
test
$ podman ps
CONTAINER ID  IMAGE       COMMAND     CREATED     STATUS      PORTS       NAMES

mheon · 2023-04-19T16:27:25Z

This seems a little iffy - it can change a container that is already in Exited state to Stopping from my read, which does not seem correct. Is just setting StoppedByUser not sufficient?

…

On Wed, Apr 19, 2023 at 12:01 Paul Holzinger ***@***.***> wrote: Kill is not supposed to work on non running containers. You have to test with podman kill <name>, then you see it exists out with an error. --all is of course a NOP if the continuer is not running. If you try it often enough and hit the window were the container is really running then it works: $ podman kill test Error: can only kill running containers. 7b6a2199d8e9d1e5148c23706bec7851ba748de23170bdd0a1f514dcf9669e1e is in state stopped: container state improper $ podman kill test Error: can only kill running containers. 7b6a2199d8e9d1e5148c23706bec7851ba748de23170bdd0a1f514dcf9669e1e is in state stopped: container state improper $ podman kill test test $ podman ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES — Reply to this email directly, view it on GitHub <#18267 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB3AOCCLESCCHKZPWKQ6U7DXCAD7HANCNFSM6AAAAAAXD3AVBM> . You are receiving this because you were mentioned.Message ID: ***@***.***>

Luap99 · 2023-04-19T16:32:08Z

This seems a little iffy - it can change a container that is already in Exited state to Stopping from my read, which does not seem correct. Is just setting StoppedByUser not sufficient?

Stopping is not set when the sate is not correct see the if cannotStopErr == nil { block, so yes I am only setting StoppedByUser. The reason I choose to do it that way is to only have one c.save() because we care about the performance of these calls AFAIK.

mheon · 2023-04-19T19:34:46Z

libpod/container_internal.go

+		// commands such as `podman ps` from progressing if the container lock
+		// is held when busy-waiting for the container to be stopped.
+		c.state.State = define.ContainerStateStopping
+	}
 	if err := c.save(); err != nil {
 		return fmt.Errorf("saving container %s state before stopping: %w", c.ID(), err)


Maybe log this instead if cannotStopErr is not nil, so we don't lose the real error?

mheon · 2023-04-19T19:35:01Z

One suggestion, otherwise LGTM

Commit 1ab833f improved the situation but it is still not enough. If you run short lived containers with --restart=always podman is basically permanently restarting them. To only way to stop this is podman stop. However podman stop does not do anything when the container is already in a not running state. While this makes sense we should still mark the container as explicitly stopped by the user. Together with the change in shouldRestart() which now checks for StoppedByUser this makes sure the cleanup process is not going to start it back up again. A simple reproducer is: ``` podman run --restart=always --name test -d alpine true podman stop test ``` then check if the container is still running, the behavior is very flaky, it took me like 20 podman stop tries before I finally hit the correct window were it was stopped permanently. With this patch it worked on the first try. Fixes containers#18259 [NO NEW TESTS NEEDED] This is super flaky and hard to correctly test in CI. MY ginkgo v2 work seems to trigger this in play kube tests so that should catch at least some regressions. Also this may be something that should be tested at podman test days by users (containers#17912). Signed-off-by: Paul Holzinger <[email protected]>

Luap99 · 2023-04-20T11:10:50Z

tests are green, please merge

vrothberg

/lgtm

openshift-ci · 2023-04-20T11:11:54Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Luap99, vrothberg

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [Luap99,vrothberg]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci bot added release-note approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Apr 19, 2023

edsantiago reviewed Apr 19, 2023

View reviewed changes

libpod/container_internal.go Outdated Show resolved Hide resolved

libpod/container_internal.go Outdated Show resolved Hide resolved

libpod/container_internal.go Outdated Show resolved Hide resolved

Luap99 force-pushed the always-stop branch from d612b6a to 4135b6a Compare April 19, 2023 11:05

Luap99 force-pushed the always-stop branch from 4135b6a to 405bcd2 Compare April 19, 2023 12:17

vrothberg reviewed Apr 19, 2023

View reviewed changes

libpod/container_internal.go Outdated Show resolved Hide resolved

Luap99 force-pushed the always-stop branch from 405bcd2 to f3d453c Compare April 19, 2023 12:59

mheon reviewed Apr 19, 2023

View reviewed changes

Luap99 force-pushed the always-stop branch from f3d453c to edb64f8 Compare April 20, 2023 09:23

vrothberg approved these changes Apr 20, 2023

View reviewed changes

openshift-ci bot assigned vrothberg Apr 20, 2023

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Apr 20, 2023

openshift-merge-robot merged commit f570201 into containers:main Apr 20, 2023

Luap99 deleted the always-stop branch April 20, 2023 11:21

github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Aug 27, 2023

github-actions bot locked as resolved and limited conversation to collaborators Aug 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

libpod: stop containers with --restart=always #18267

libpod: stop containers with --restart=always #18267

Luap99 commented Apr 19, 2023

edsantiago left a comment

Luap99 commented Apr 19, 2023

vrothberg left a comment

edsantiago commented Apr 19, 2023

edsantiago commented Apr 19, 2023

Luap99 commented Apr 19, 2023

edsantiago commented Apr 19, 2023

Luap99 commented Apr 19, 2023

mheon commented Apr 19, 2023 via email

Luap99 commented Apr 19, 2023

mheon Apr 19, 2023

Luap99 Apr 20, 2023

mheon commented Apr 19, 2023

Luap99 commented Apr 20, 2023

vrothberg left a comment

openshift-ci bot commented Apr 20, 2023

libpod: stop containers with --restart=always #18267

libpod: stop containers with --restart=always #18267

Conversation

Luap99 commented Apr 19, 2023

Does this PR introduce a user-facing change?

edsantiago left a comment

Choose a reason for hiding this comment

Luap99 commented Apr 19, 2023

vrothberg left a comment

Choose a reason for hiding this comment

edsantiago commented Apr 19, 2023

edsantiago commented Apr 19, 2023

Luap99 commented Apr 19, 2023

edsantiago commented Apr 19, 2023

Luap99 commented Apr 19, 2023

mheon commented Apr 19, 2023 via email

Luap99 commented Apr 19, 2023

mheon Apr 19, 2023

Choose a reason for hiding this comment

Luap99 Apr 20, 2023

Choose a reason for hiding this comment

mheon commented Apr 19, 2023

Luap99 commented Apr 20, 2023

vrothberg left a comment

Choose a reason for hiding this comment

openshift-ci bot commented Apr 20, 2023