-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid unnecessary timeout of 250msec when waiting on container shutdown #16088
Avoid unnecessary timeout of 250msec when waiting on container shutdown #16088
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
libpod/container_api.go
Outdated
// If possible (pidfd works), the first cycle we block until conmon dies | ||
// If this happens, and we fall back to the old poll delay | ||
fds := []unix.PollFd{{Fd: int32(conmonPidFd), Events: unix.POLLIN}} | ||
_, _ = unix.Poll(fds, -1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we handle the error here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking of possibly handling EINTR. But, I avoided it, because in general any kind of "smart handling" may get things wrong and accidentally turn this into a busy wait loop. Better to just fall back to the old "safe" timeout loop in case anything weird happens.
3bd3aa2
to
d12f5c5
Compare
When you run "podman run foo" we attach to the container, which essentially blocks until the container process exits. When that happens podman immediately calls Container.WaitForExit(), but at this point the exit value has not yet been written to the db by conmon. This means that we almost always hit the "check for exit state; sleep 250msec" loop in WaitForExit(), delaying the exit of podman run by 250 msec. More recent kernels (>= 5.3) supports the pidfd_open() syscall, that lets you open a fd representing a pid and then poll on it to wait until the process exits. We can use this to have the first sleep be exactly as long as is needed for conmon to exit (if we know its pid). If for whatever reason there is still issues we use the old sleep loop on later iterations. This makes "time podman run fedora true" about 200msec faster. [NO NEW TESTS NEEDED] Signed-off-by: Alexander Larsson <[email protected]>
d12f5c5
to
c34b5be
Compare
So, I was forced to apply the timeout to the poll() operation too, because for some reason the test/e2e/play_kube_test.go otherwise runs into a deadlock. The "podman play kube" doesn't exit because it waits for conmon to die, but it never dies. According to @giuseppe this is because the Pod.Start() operation grabs a lock which then blocks taking the same lock in Pod.stopIfOnlyInfraRemains(). With the timeout we eventually run the loop even if conmon didn't die, and we then find the updated exit code status in the database so we exit and that then alllows conmon to finish. This kind of timeout is not ideal, and we should try to fix the core issue, but at least this is not a regression, we used to have a sleep loop and we now still have one, only it may exit earlier if possible. |
To be more specific: conmon calls out to podman container cleanup, which then calls pod.stopIfOnlyInfraRemains() to see if that was the last container in the pod, but that tries to take the pod lock, which is already taken by pod.Start() in the original podman process. |
Are there any circumstances where Conmon could be dead, but the exit file not yet fully written to disc? We have some important Podman users on systems with extremely slow I/O, which makes me worry about this. |
if conmon is dead nobody is going to write to that file no? |
Fair enough... Code LGTM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: alexlarsson, giuseppe The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@alexlarsson could you please add release notes to the PR description? |
@giuseppe Updated |
When you run "podman run foo" we attach to the container, which essentially blocks until the container process exits. When that happens podman immediately calls Container.WaitForExit(), but at this point the exit value has not yet been written to the db by conmon. This means that we almost always hit the "check for exit state; sleep 250msec" loop in WaitForExit(), delaying the exit of podman run by 250 msec.
More recent kernels (>= 5.3) supports the pidfd_open() syscall, that lets you open a fd representing a pid and then poll on it to wait until the process exits. We can use this to have the first sleep be exactly as long as is needed for conmon to exit (if we know its pid). If for whatever reason there is still issues we use the old sleep loop on later iterations.
This makes "time podman run fedora true" about 200msec faster.
[NO NEW TESTS NEEDED]
Signed-off-by: Alexander Larsson [email protected]