Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kill: resync the container if runtime fails #16320

Closed
wants to merge 1 commit into from

Conversation

vrothberg
Copy link
Member

If the runtime fails to kill the container there is fair chance that the container has transitionted to another state or been removed already. Take the lock and resync the container to check for that to prevent reading old and potentially outdated state.

[NO NEW TESTS NEEDED]

Signed-off-by: Valentin Rothberg [email protected]

Does this PR introduce a user-facing change?

Fix a bug where `podman kill` would mistakenly fail when reading outdated state.

@containers/podman-maintainers PTAL

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 27, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: vrothberg

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 27, 2022
Copy link
Member

@Luap99 Luap99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will deadlock, at least one caller has already the container lock:

func (c *Container) Kill(signal uint) error {
if !c.batched {
c.lock.Lock()
defer c.lock.Unlock()
if err := c.syncContainer(); err != nil {
return err
}
}

@tyler92
Copy link
Contributor

tyler92 commented Oct 27, 2022

This will deadlock, at least one caller has already the container lock:

func (c *Container) Kill(signal uint) error {
if !c.batched {
c.lock.Lock()
defer c.lock.Unlock()
if err := c.syncContainer(); err != nil {
return err
}
}

Yes, SA #15492

@vrothberg
Copy link
Member Author

This will deadlock, at least one caller has already the container lock:

No caller should have the lock at that point.

@vrothberg
Copy link
Member Author

This will deadlock, at least one caller has already the container lock:

No caller should have the lock at that point.

Ah, OK. That Kill needs to be fixed then. Thanks for highlighting that!

If the runtime fails to kill the container there is fair chance that the
container has transitionted to another state or been removed already.
Take the lock and resync the container to check for that to prevent
reading old and potentially outdated state.

[NO NEW TESTS NEEDED]

Signed-off-by: Valentin Rothberg <[email protected]>
@vrothberg
Copy link
Member Author

Updated. I've moved the checks essentially up the stack. @tyler92 @Luap99 PTAL

@@ -370,15 +372,7 @@ func (r *ConmonOCIRuntime) KillContainer(ctr *Container, signal uint, all bool)
args = append(args, "kill", ctr.ID(), fmt.Sprintf("%d", signal))
}
if err := utils.ExecCmdWithStdStreams(os.Stdin, os.Stdout, os.Stderr, env, r.path, args...); err != nil {
// Update container state - there's a chance we failed because
// the container exited in the meantime.
if err2 := r.UpdateContainerStatus(ctr); err2 != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't do this, this is going to break things further up the stack re: Sigproxy. The update needs to remain here.

logrus.Infof("Error updating status for container %s: %v", ctr.ID(), err2)
}
if ctr.ensureState(define.ContainerStateStopped, define.ContainerStateExited) {
return fmt.Errorf("%w: %s", define.ErrCtrStateInvalid, ctr.state.State)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also going to break SigProxy. We need this to remain ErrCtrStateInvalid.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate a bit more? At least for kill and stop, the container is not locked, so we should not fiddle with the state.

Maybe we need to make this conditional?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The container is locked during kill.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see only one place where KillContainer is run unlocked. I'll fix that.

We should not be touching KillContainer as such.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. It's unlocked during stop but not kill.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#16323 catches the last case where KillContainer is run unlocked

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see only one place where KillContainer is run unlocked. I'll fix that.

So you will fix #16142?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure

@vrothberg
Copy link
Member Author

Closing, @mheon wants to take a shot at this.

@vrothberg vrothberg closed this Oct 28, 2022
@vrothberg vrothberg deleted the fix-16142 branch October 28, 2022 06:40
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 20, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 20, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. release-note
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants