Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set StoppedByUser earlier in the process of stopping #17077

Merged
merged 1 commit into from
Jan 12, 2023

Conversation

mheon
Copy link
Member

@mheon mheon commented Jan 11, 2023

The StoppedByUser variable indicates that the container was requested to stop by a user. It's used to prevent restart policy from firing (so that a restart=always container won't restart if the user does a podman stop. The problem is we were setting it very late in the stop() function. Originally, this was fine, but after the changes to add the new Stopping state, the logic that triggered restart policy was firing before StoppedByUser was even set - so the container would still restart.

Setting it earlier shouldn't hurt anything and guarantees that checks will see that the container was stopped manually.

Fixes #17069

Does this PR introduce a user-facing change?

Fixed a bug where containers with a restart policy set could still restart even after a manual `podman stop`.

@openshift-ci openshift-ci bot added release-note approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Jan 11, 2023
@mheon
Copy link
Member Author

mheon commented Jan 11, 2023

Still needs a test

@mheon
Copy link
Member Author

mheon commented Jan 11, 2023

Test added

@edsantiago
Copy link
Member

Do you have a sense for how easy/hard it is to trigger the restart? I started a loop-test, running against main, and have not seen it reproduce yet. (Test started a few minutes after you added your test). Any hints on how to manifest the restart bug?

@mheon
Copy link
Member Author

mheon commented Jan 11, 2023

Hm. It should be 100%. I'll look into it more...

@edsantiago
Copy link
Member

Here's what I'm seeing:

  • if I run top with restart=always, then stop it, it stays stopped. Even on main.
  • if I run date, or something else that exits by itself, and then stop that, it continues restarting despite my stop.

Does that make any sense to you? Does it help?

@edsantiago
Copy link
Member

Update: even with your PR, podman stop on the date container does not actually keep it stopped:

$ bin/podman run -d --restart=always --name foo quay.io/libpod/testimage:20221018 date
[cid]
$ bin/podman logs foo | wc -l
26
$ bin/podman stop foo
foo
$ bin/podman logs foo|wc -l
195
$ !!
216

@mheon
Copy link
Member Author

mheon commented Jan 11, 2023

I think that's probably a bug... Just a separate one. I'll look into what Docker does to be sure.

@mheon
Copy link
Member Author

mheon commented Jan 11, 2023

Yep, that's a separate bug. I'll handle it here.

@edsantiago
Copy link
Member

Ack. Any idea how to reproduce the original bug that you set out to fix?

@edsantiago
Copy link
Member

And, #17083 (purported fix for the hang) is merged, please rebase before pushing

@mheon
Copy link
Member Author

mheon commented Jan 11, 2023

@edsantiago The only reproducer I'm aware of uses podman-compose, which introduces enough questions into what's going on that I haven't been able to reproduce without it.

@rhatdan
Copy link
Member

rhatdan commented Jan 11, 2023

LGTM

Expect(stop).Should(Exit(0))

// This is ugly, but I don't see a better way
time.Sleep(10 * time.Second)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it need the 10? Could we get away with less? Perhaps 5?

@TomSweeneyRedHat
Copy link
Member

Fedora root tests are timing out after 90 minutes. @edsantiago or @cevich I'm seeing this in a number of places, do we know the cause?

@TomSweeneyRedHat
Copy link
Member

Changes LGTM with happy tests

@edsantiago
Copy link
Member

@TomSweeneyRedHat timeout is fixed (:crossed_fingers:) in #17083

Copy link
Member

@edsantiago edsantiago left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocking for merge because test needed (as discussed in conversation, test passes on main). It's possible that there's no reproducer that will fail on main, and maybe that's fine, but I don't want anyone to restart the flakes, have them pass, and then merge this overnight.

@mheon
Copy link
Member Author

mheon commented Jan 12, 2023

At this point, I don't really know what's going on with restart policy and podman-compose, but this does not reproduce on main. Still, we aren't actually testing that restart policy works in this way anywhere, so I think this is still worth merging just to make sure we don't regress in the future. I'll rebase to fix CI.

The StoppedByUser variable indicates that the container was
requested to stop by a user. It's used to prevent restart policy
from firing (so that a restart=always container won't restart if
the user does a `podman stop`. The problem is we were setting it
*very* late in the stop() function. Originally, this was fine,
but after the changes to add the new Stopping state, the logic
that triggered restart policy was firing before StoppedByUser was
even set - so the container would still restart.

Setting it earlier shouldn't hurt anything and guarantees that
checks will see that the container was stopped manually.

Fixes containers#17069

Signed-off-by: Matthew Heon <[email protected]>
@mheon mheon force-pushed the set_stopping_early branch from 85e6be7 to 1ab833f Compare January 12, 2023 19:46
@edsantiago
Copy link
Member

SGTM. I was tempted to suggest ditching the expensive test, ... but I guess it's still a good regression check.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 12, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: edsantiago, mheon

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@rhatdan
Copy link
Member

rhatdan commented Jan 12, 2023

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jan 12, 2023
@openshift-merge-robot openshift-merge-robot merged commit 3e229b0 into containers:main Jan 12, 2023
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 15, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 15, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. release-note
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug]: podman stop results in panic: runtime error: invalid memory address or nil pointer dereference
5 participants