Avoid unnecessary timeout of 250msec when waiting on container shutdown #16088

alexlarsson · 2022-10-07T15:30:43Z

When you run "podman run foo" we attach to the container, which essentially blocks until the container process exits. When that happens podman immediately calls Container.WaitForExit(), but at this point the exit value has not yet been written to the db by conmon. This means that we almost always hit the "check for exit state; sleep 250msec" loop in WaitForExit(), delaying the exit of podman run by 250 msec.

More recent kernels (>= 5.3) supports the pidfd_open() syscall, that lets you open a fd representing a pid and then poll on it to wait until the process exits. We can use this to have the first sleep be exactly as long as is needed for conmon to exit (if we know its pid). If for whatever reason there is still issues we use the old sleep loop on later iterations.

This makes "time podman run fedora true" about 200msec faster.

[NO NEW TESTS NEEDED]

Signed-off-by: Alexander Larsson [email protected]

podman run now exits immediately when conmon exits, rather than waiting for a 250msec timeout

giuseppe

LGTM

Luap99 · 2022-10-07T16:12:04Z

libpod/container_api.go

+				// If possible (pidfd works), the first cycle we block until conmon dies
+				// If this happens, and we fall back to the old poll delay
+				fds := []unix.PollFd{{Fd: int32(conmonPidFd), Events: unix.POLLIN}}
+				_, _ = unix.Poll(fds, -1)


should we handle the error here?

I was thinking of possibly handling EINTR. But, I avoided it, because in general any kind of "smart handling" may get things wrong and accidentally turn this into a busy wait loop. Better to just fall back to the old "safe" timeout loop in case anything weird happens.

libpod/container_api.go

When you run "podman run foo" we attach to the container, which essentially blocks until the container process exits. When that happens podman immediately calls Container.WaitForExit(), but at this point the exit value has not yet been written to the db by conmon. This means that we almost always hit the "check for exit state; sleep 250msec" loop in WaitForExit(), delaying the exit of podman run by 250 msec. More recent kernels (>= 5.3) supports the pidfd_open() syscall, that lets you open a fd representing a pid and then poll on it to wait until the process exits. We can use this to have the first sleep be exactly as long as is needed for conmon to exit (if we know its pid). If for whatever reason there is still issues we use the old sleep loop on later iterations. This makes "time podman run fedora true" about 200msec faster. [NO NEW TESTS NEEDED] Signed-off-by: Alexander Larsson <[email protected]>

alexlarsson · 2022-10-10T09:47:31Z

So, I was forced to apply the timeout to the poll() operation too, because for some reason the test/e2e/play_kube_test.go otherwise runs into a deadlock. The "podman play kube" doesn't exit because it waits for conmon to die, but it never dies. According to @giuseppe this is because the Pod.Start() operation grabs a lock which then blocks taking the same lock in Pod.stopIfOnlyInfraRemains().

With the timeout we eventually run the loop even if conmon didn't die, and we then find the updated exit code status in the database so we exit and that then alllows conmon to finish.

This kind of timeout is not ideal, and we should try to fix the core issue, but at least this is not a regression, we used to have a sleep loop and we now still have one, only it may exit earlier if possible.

alexlarsson · 2022-10-10T10:03:46Z

To be more specific: conmon calls out to podman container cleanup, which then calls pod.stopIfOnlyInfraRemains() to see if that was the last container in the pod, but that tries to take the pod lock, which is already taken by pod.Start() in the original podman process.

rhatdan · 2022-10-10T13:04:53Z

LGTM
@Luap99 @giuseppe PTAL

mheon · 2022-10-10T13:28:31Z

Are there any circumstances where Conmon could be dead, but the exit file not yet fully written to disc? We have some important Podman users on systems with extremely slow I/O, which makes me worry about this.

giuseppe · 2022-10-10T13:29:51Z

Are there any circumstances where Conmon could be dead, but the exit file not yet fully written to disc? We have some important Podman users on systems with extremely slow I/O, which makes me worry about this.

if conmon is dead nobody is going to write to that file no?

mheon · 2022-10-10T13:31:04Z

Fair enough...

Code LGTM

giuseppe

/lgtm

openshift-ci · 2022-10-10T13:39:12Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: alexlarsson, giuseppe

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [giuseppe]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

giuseppe · 2022-10-10T18:37:48Z

@alexlarsson could you please add release notes to the PR description?

alexlarsson · 2022-10-11T14:13:16Z

@giuseppe Updated

openshift-ci bot added the do-not-merge/release-note-label-needed Enforce release-note requirement, even if just None label Oct 7, 2022

giuseppe approved these changes Oct 7, 2022

View reviewed changes

openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 7, 2022

Luap99 reviewed Oct 7, 2022

View reviewed changes

rhatdan reviewed Oct 7, 2022

View reviewed changes

libpod/container_api.go Show resolved Hide resolved

alexlarsson force-pushed the wait-on-conmon-without-sleep branch from 3bd3aa2 to d12f5c5 Compare October 10, 2022 07:43

alexlarsson force-pushed the wait-on-conmon-without-sleep branch from d12f5c5 to c34b5be Compare October 10, 2022 09:43

giuseppe approved these changes Oct 10, 2022

View reviewed changes

openshift-ci bot assigned giuseppe Oct 10, 2022

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Oct 10, 2022

openshift-ci bot added release-note and removed do-not-merge/release-note-label-needed Enforce release-note requirement, even if just None labels Oct 11, 2022

openshift-merge-robot merged commit 619366d into containers:main Oct 11, 2022

edsantiago mentioned this pull request Oct 12, 2022

remote: pod logs: catch-up delay #16132

Closed

github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 20, 2023

github-actions bot locked as resolved and limited conversation to collaborators Sep 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid unnecessary timeout of 250msec when waiting on container shutdown #16088

Avoid unnecessary timeout of 250msec when waiting on container shutdown #16088

alexlarsson commented Oct 7, 2022 •

edited

Loading

giuseppe left a comment

Luap99 Oct 7, 2022

alexlarsson Oct 10, 2022

alexlarsson commented Oct 10, 2022

alexlarsson commented Oct 10, 2022

rhatdan commented Oct 10, 2022

mheon commented Oct 10, 2022

giuseppe commented Oct 10, 2022

mheon commented Oct 10, 2022

giuseppe left a comment

openshift-ci bot commented Oct 10, 2022

giuseppe commented Oct 10, 2022

alexlarsson commented Oct 11, 2022

Avoid unnecessary timeout of 250msec when waiting on container shutdown #16088

Avoid unnecessary timeout of 250msec when waiting on container shutdown #16088

Conversation

alexlarsson commented Oct 7, 2022 • edited Loading

giuseppe left a comment

Choose a reason for hiding this comment

Luap99 Oct 7, 2022

Choose a reason for hiding this comment

alexlarsson Oct 10, 2022

Choose a reason for hiding this comment

alexlarsson commented Oct 10, 2022

alexlarsson commented Oct 10, 2022

rhatdan commented Oct 10, 2022

mheon commented Oct 10, 2022

giuseppe commented Oct 10, 2022

mheon commented Oct 10, 2022

giuseppe left a comment

Choose a reason for hiding this comment

openshift-ci bot commented Oct 10, 2022

giuseppe commented Oct 10, 2022

alexlarsson commented Oct 11, 2022

alexlarsson commented Oct 7, 2022 •

edited

Loading