Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid unnecessary timeout of 250msec when waiting on container shutdown #16088

Conversation

alexlarsson
Copy link
Contributor

@alexlarsson alexlarsson commented Oct 7, 2022

When you run "podman run foo" we attach to the container, which essentially blocks until the container process exits. When that happens podman immediately calls Container.WaitForExit(), but at this point the exit value has not yet been written to the db by conmon. This means that we almost always hit the "check for exit state; sleep 250msec" loop in WaitForExit(), delaying the exit of podman run by 250 msec.

More recent kernels (>= 5.3) supports the pidfd_open() syscall, that lets you open a fd representing a pid and then poll on it to wait until the process exits. We can use this to have the first sleep be exactly as long as is needed for conmon to exit (if we know its pid). If for whatever reason there is still issues we use the old sleep loop on later iterations.

This makes "time podman run fedora true" about 200msec faster.

[NO NEW TESTS NEEDED]

Signed-off-by: Alexander Larsson [email protected]

podman run now exits immediately when conmon exits, rather than waiting for a 250msec timeout

@openshift-ci openshift-ci bot added the do-not-merge/release-note-label-needed Enforce release-note requirement, even if just None label Oct 7, 2022
Copy link
Member

@giuseppe giuseppe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 7, 2022
// If possible (pidfd works), the first cycle we block until conmon dies
// If this happens, and we fall back to the old poll delay
fds := []unix.PollFd{{Fd: int32(conmonPidFd), Events: unix.POLLIN}}
_, _ = unix.Poll(fds, -1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we handle the error here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking of possibly handling EINTR. But, I avoided it, because in general any kind of "smart handling" may get things wrong and accidentally turn this into a busy wait loop. Better to just fall back to the old "safe" timeout loop in case anything weird happens.

@alexlarsson alexlarsson force-pushed the wait-on-conmon-without-sleep branch from 3bd3aa2 to d12f5c5 Compare October 10, 2022 07:43
When you run "podman run foo" we attach to the container, which essentially
blocks until the container process exits. When that happens podman immediately
calls Container.WaitForExit(), but at this point the exit value has not
yet been written to the db by conmon. This means that we almost always
hit the "check for exit state; sleep 250msec" loop in WaitForExit(),
delaying the exit of podman run by 250 msec.

More recent kernels (>= 5.3) supports the pidfd_open() syscall, that
lets you open a fd representing a pid and then poll on it to wait
until the process exits. We can use this to have the first sleep
be exactly as long as is needed for conmon to exit (if we know its pid).
If for whatever reason there is still issues we use the old sleep loop
on later iterations.

This makes "time podman run fedora true" about 200msec faster.

[NO NEW TESTS NEEDED]

Signed-off-by: Alexander Larsson <[email protected]>
@alexlarsson alexlarsson force-pushed the wait-on-conmon-without-sleep branch from d12f5c5 to c34b5be Compare October 10, 2022 09:43
@alexlarsson
Copy link
Contributor Author

So, I was forced to apply the timeout to the poll() operation too, because for some reason the test/e2e/play_kube_test.go otherwise runs into a deadlock. The "podman play kube" doesn't exit because it waits for conmon to die, but it never dies. According to @giuseppe this is because the Pod.Start() operation grabs a lock which then blocks taking the same lock in Pod.stopIfOnlyInfraRemains().

With the timeout we eventually run the loop even if conmon didn't die, and we then find the updated exit code status in the database so we exit and that then alllows conmon to finish.

This kind of timeout is not ideal, and we should try to fix the core issue, but at least this is not a regression, we used to have a sleep loop and we now still have one, only it may exit earlier if possible.

@alexlarsson
Copy link
Contributor Author

To be more specific: conmon calls out to podman container cleanup, which then calls pod.stopIfOnlyInfraRemains() to see if that was the last container in the pod, but that tries to take the pod lock, which is already taken by pod.Start() in the original podman process.

@rhatdan
Copy link
Member

rhatdan commented Oct 10, 2022

LGTM
@Luap99 @giuseppe PTAL

@mheon
Copy link
Member

mheon commented Oct 10, 2022

Are there any circumstances where Conmon could be dead, but the exit file not yet fully written to disc? We have some important Podman users on systems with extremely slow I/O, which makes me worry about this.

@giuseppe
Copy link
Member

Are there any circumstances where Conmon could be dead, but the exit file not yet fully written to disc? We have some important Podman users on systems with extremely slow I/O, which makes me worry about this.

if conmon is dead nobody is going to write to that file no?

@mheon
Copy link
Member

mheon commented Oct 10, 2022

Fair enough...

Code LGTM

Copy link
Member

@giuseppe giuseppe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Oct 10, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 10, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: alexlarsson, giuseppe

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@giuseppe
Copy link
Member

@alexlarsson could you please add release notes to the PR description?

@openshift-ci openshift-ci bot added release-note and removed do-not-merge/release-note-label-needed Enforce release-note requirement, even if just None labels Oct 11, 2022
@alexlarsson
Copy link
Contributor Author

@giuseppe Updated

@openshift-merge-robot openshift-merge-robot merged commit 619366d into containers:main Oct 11, 2022
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 20, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 20, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. release-note
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants