-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kill: resync the container if runtime fails #16320
Conversation
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: vrothberg The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will deadlock, at least one caller has already the container lock:
podman/libpod/container_api.go
Lines 218 to 226 in 47bcd10
func (c *Container) Kill(signal uint) error { | |
if !c.batched { | |
c.lock.Lock() | |
defer c.lock.Unlock() | |
if err := c.syncContainer(); err != nil { | |
return err | |
} | |
} |
Yes, SA #15492 |
No caller should have the lock at that point. |
Ah, OK. That Kill needs to be fixed then. Thanks for highlighting that! |
If the runtime fails to kill the container there is fair chance that the container has transitionted to another state or been removed already. Take the lock and resync the container to check for that to prevent reading old and potentially outdated state. [NO NEW TESTS NEEDED] Signed-off-by: Valentin Rothberg <[email protected]>
@@ -370,15 +372,7 @@ func (r *ConmonOCIRuntime) KillContainer(ctr *Container, signal uint, all bool) | |||
args = append(args, "kill", ctr.ID(), fmt.Sprintf("%d", signal)) | |||
} | |||
if err := utils.ExecCmdWithStdStreams(os.Stdin, os.Stdout, os.Stderr, env, r.path, args...); err != nil { | |||
// Update container state - there's a chance we failed because | |||
// the container exited in the meantime. | |||
if err2 := r.UpdateContainerStatus(ctr); err2 != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't do this, this is going to break things further up the stack re: Sigproxy. The update needs to remain here.
logrus.Infof("Error updating status for container %s: %v", ctr.ID(), err2) | ||
} | ||
if ctr.ensureState(define.ContainerStateStopped, define.ContainerStateExited) { | ||
return fmt.Errorf("%w: %s", define.ErrCtrStateInvalid, ctr.state.State) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is also going to break SigProxy. We need this to remain ErrCtrStateInvalid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you elaborate a bit more? At least for kill and stop, the container is not locked, so we should not fiddle with the state.
Maybe we need to make this conditional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The container is locked during kill.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see only one place where KillContainer is run unlocked. I'll fix that.
We should not be touching KillContainer as such.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right. It's unlocked during stop but not kill.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#16323 catches the last case where KillContainer is run unlocked
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see only one place where KillContainer is run unlocked. I'll fix that.
So you will fix #16142?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure
Closing, @mheon wants to take a shot at this. |
If the runtime fails to kill the container there is fair chance that the container has transitionted to another state or been removed already. Take the lock and resync the container to check for that to prevent reading old and potentially outdated state.
[NO NEW TESTS NEEDED]
Signed-off-by: Valentin Rothberg [email protected]
Does this PR introduce a user-facing change?
@containers/podman-maintainers PTAL