libpod: Remove 100msec delay during shutdown #16072

alexlarsson · 2022-10-06T15:54:07Z

When shutting down the image engine we always wait for the image even goroutine to finish writing any outstanding events. However, the loop for that always waits 100msec every iteration. This means that (depending on the phase) shutdown is always delayed up to 100msec.

This is delaying "podman run" extra much because podman is run twice (once for the run and once as cleanup via a conmon callback).

Changing the image loop to exit immediately when a libimageEventsShutdown (but first checking for any outstanding events to write) improves podman run times by about 100msec on average.

[NO NEW TESTS NEEDED]

Signed-off-by: Alexander Larsson [email protected]

openshift-ci · 2022-10-06T15:54:11Z

@alexlarsson: Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

giuseppe

LGTM

openshift-ci · 2022-10-06T16:07:55Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: alexlarsson, giuseppe

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [giuseppe]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

rhatdan · 2022-10-06T16:44:40Z

LGTM
@mheon PTAL

mheon · 2022-10-06T16:51:10Z

LGTM. Restarted two test failures which looked like flakes.

mheon · 2022-10-06T16:51:22Z

/lgtm
/hold

mheon · 2022-10-06T18:39:12Z

System test failures look real

baude · 2022-10-07T02:50:41Z

agree

alexlarsson · 2022-10-07T07:38:06Z

Indeed, with this patch, bin/podman --events-backend=file push fedora dir:somedir doesn't log the push event.
So, apparently some other thread/goroutine is racing with shutdown, and only the arbitrary "sleep 100msec" made that work before. That doesn't seem like a reliable synchronization method though...

alexlarsson · 2022-10-07T07:59:39Z

So, the issue was that we were indeed handling the remaining events before exiting the eventloop goroutine. However, the shutdown side only blocked until the eventloop read the shutdown event, not until it finished the eventloop iteration. I fixed this by adding a second synchronization step on the close of the shutdown channel. So, we send the shutdown event, then block on the channel closing (which happens on exit from the eventloop now).

When shutting down the image engine we always wait for the image even goroutine to finish writing any outstanding events. However, the loop for that always waits 100msec every iteration. This means that (depending on the phase) shutdown is always delayed up to 100msec. This is delaying "podman run" extra much because podman is run twice (once for the run and once as cleanup via a conmon callback). Changing the image loop to exit immediately when a libimageEventsShutdown (but first checking for any outstanding events to write) improves podman run times by about 100msec on average. Note: We can't just block on the event loop reading the shutdown event anymore, we need to wait until it read and processed any outstanding events, so we now send the shutdown event and then block waiting for the channel to be closed by the event loop. [NO NEW TESTS NEEDED] Signed-off-by: Alexander Larsson <[email protected]>

vrothberg

LGTM, nice catch!

rhatdan · 2022-10-07T11:13:18Z

/lgtm
/hold cancel

openshift-ci bot added the do-not-merge/release-note-label-needed Enforce release-note requirement, even if just None label Oct 6, 2022

giuseppe approved these changes Oct 6, 2022

View reviewed changes

openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 6, 2022

rhatdan removed the do-not-merge/release-note-label-needed Enforce release-note requirement, even if just None label Oct 6, 2022

openshift-ci bot assigned mheon Oct 6, 2022

openshift-ci bot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lgtm Indicates that a PR is ready to be merged. labels Oct 6, 2022

alexlarsson force-pushed the events-shutdown-nosleep branch from 8163d8f to 600893f Compare October 7, 2022 07:56

openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Oct 7, 2022

alexlarsson force-pushed the events-shutdown-nosleep branch from 600893f to 5b71070 Compare October 7, 2022 08:13

vrothberg reviewed Oct 7, 2022

View reviewed changes

openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 7, 2022

openshift-ci bot assigned rhatdan Oct 7, 2022

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Oct 7, 2022

openshift-merge-robot merged commit 2062ab9 into containers:main Oct 7, 2022

edsantiago mentioned this pull request Oct 12, 2022

kill cpcontainer: could not be stopped... sending SIGKILL... container state improper #16142

Closed

github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 20, 2023

github-actions bot locked as resolved and limited conversation to collaborators Sep 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

libpod: Remove 100msec delay during shutdown #16072

libpod: Remove 100msec delay during shutdown #16072

alexlarsson commented Oct 6, 2022

openshift-ci bot commented Oct 6, 2022

giuseppe left a comment

openshift-ci bot commented Oct 6, 2022

rhatdan commented Oct 6, 2022

mheon commented Oct 6, 2022

mheon commented Oct 6, 2022

mheon commented Oct 6, 2022

baude commented Oct 7, 2022

alexlarsson commented Oct 7, 2022

alexlarsson commented Oct 7, 2022

vrothberg left a comment

rhatdan commented Oct 7, 2022

libpod: Remove 100msec delay during shutdown #16072

libpod: Remove 100msec delay during shutdown #16072

Conversation

alexlarsson commented Oct 6, 2022

openshift-ci bot commented Oct 6, 2022

giuseppe left a comment

Choose a reason for hiding this comment

openshift-ci bot commented Oct 6, 2022

rhatdan commented Oct 6, 2022

mheon commented Oct 6, 2022

mheon commented Oct 6, 2022

mheon commented Oct 6, 2022

baude commented Oct 7, 2022

alexlarsson commented Oct 7, 2022

alexlarsson commented Oct 7, 2022

vrothberg left a comment

Choose a reason for hiding this comment

rhatdan commented Oct 7, 2022