Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libpod: Remove 100msec delay during shutdown #16072

Merged

Conversation

alexlarsson
Copy link
Contributor

When shutting down the image engine we always wait for the image even goroutine to finish writing any outstanding events. However, the loop for that always waits 100msec every iteration. This means that (depending on the phase) shutdown is always delayed up to 100msec.

This is delaying "podman run" extra much because podman is run twice (once for the run and once as cleanup via a conmon callback).

Changing the image loop to exit immediately when a libimageEventsShutdown (but first checking for any outstanding events to write) improves podman run times by about 100msec on average.

[NO NEW TESTS NEEDED]

Signed-off-by: Alexander Larsson [email protected]

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 6, 2022

@alexlarsson: Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot added the do-not-merge/release-note-label-needed Enforce release-note requirement, even if just None label Oct 6, 2022
Copy link
Member

@giuseppe giuseppe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 6, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: alexlarsson, giuseppe

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 6, 2022
@rhatdan rhatdan removed the do-not-merge/release-note-label-needed Enforce release-note requirement, even if just None label Oct 6, 2022
@rhatdan
Copy link
Member

rhatdan commented Oct 6, 2022

LGTM
@mheon PTAL

@mheon
Copy link
Member

mheon commented Oct 6, 2022

LGTM. Restarted two test failures which looked like flakes.

@mheon
Copy link
Member

mheon commented Oct 6, 2022

/lgtm
/hold

@openshift-ci openshift-ci bot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lgtm Indicates that a PR is ready to be merged. labels Oct 6, 2022
@mheon
Copy link
Member

mheon commented Oct 6, 2022

System test failures look real

@baude
Copy link
Member

baude commented Oct 7, 2022

agree

@alexlarsson
Copy link
Contributor Author

Indeed, with this patch, bin/podman --events-backend=file push fedora dir:somedir doesn't log the push event.
So, apparently some other thread/goroutine is racing with shutdown, and only the arbitrary "sleep 100msec" made that work before. That doesn't seem like a reliable synchronization method though...

@alexlarsson alexlarsson force-pushed the events-shutdown-nosleep branch from 8163d8f to 600893f Compare October 7, 2022 07:56
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Oct 7, 2022
@alexlarsson
Copy link
Contributor Author

So, the issue was that we were indeed handling the remaining events before exiting the eventloop goroutine. However, the shutdown side only blocked until the eventloop read the shutdown event, not until it finished the eventloop iteration. I fixed this by adding a second synchronization step on the close of the shutdown channel. So, we send the shutdown event, then block on the channel closing (which happens on exit from the eventloop now).

When shutting down the image engine we always wait for the image
even goroutine to finish writing any outstanding events. However,
the loop for that always waits 100msec every iteration. This means
that (depending on the phase) shutdown is always delayed up to 100msec.

This is delaying "podman run" extra much because podman is run twice
(once for the run and once as cleanup via a conmon callback).

Changing the image loop to exit immediately when a libimageEventsShutdown
(but first checking for any outstanding events to write) improves podman
run times by about 100msec on average.

Note: We can't just block on the event loop reading the shutdown event
anymore, we need to wait until it read and processed any outstanding
events, so we now send the shutdown event and then block waiting for the
channel to be closed by the event loop.

[NO NEW TESTS NEEDED]

Signed-off-by: Alexander Larsson <[email protected]>
@alexlarsson alexlarsson force-pushed the events-shutdown-nosleep branch from 600893f to 5b71070 Compare October 7, 2022 08:13
Copy link
Member

@vrothberg vrothberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, nice catch!

@rhatdan
Copy link
Member

rhatdan commented Oct 7, 2022

/lgtm
/hold cancel

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 7, 2022
@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Oct 7, 2022
@openshift-merge-robot openshift-merge-robot merged commit 2062ab9 into containers:main Oct 7, 2022
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 20, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 20, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants