-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
podman wait: new timeout, possibly deadlock #14761
Comments
Sigh. Does anyone know why the bot removes my |
Yes, because the action for some reason automatically removes the label if the regex is not matched and at least from my quick look there is no reason to turn this off. |
Ohhhhhhhh..... this: podman/.github/issue-labeler.yml Lines 11 to 13 in d095053
...which, in its documentation, states
...which seems stupid to me: if the reporter has taken the time to explicitly set a label, an inflexible rule must not override. This is such an obvious bug that there's already an issue open for it. Unfortunately, it's been ignored for two years. Oh well. Thanks for the pointer @Luap99. I guess we have to live with that for now. |
@mheon it looks very similar to what I've been observing in the gitlab PR
A container gets killed and all subsequent attempts to wait for it or even to remove it time out. |
I extracted the following reproducer: echo "..."
date
echo run
$PODMAN run -d --replace --name=123 alpine sh -c "trap 'echo Received SIGTERM, ignoring' SIGTERM; echo READY; while :; do sleep 0.2; done"
echo stop
$PODMAN stop -t 3 123 &
echo kill
$PODMAN kill 123
echo wait
$PODMAN wait 123 Works with local podman. Failed on the 2nd run with podman-remote. |
Note that the concurrent |
Started looking into it again. Just saw the following error on the server side:
No analyses yet, just want to share breadcrumbs. |
Another thing that looks suspicious when running podman-remote:
Please ignore: it was a testing fart on my end. |
The deadlock happens when the container is in "stopping" state during kill. |
Ah, got it. I'll wrap up a PR. The problem was that |
|
Make sure to record the exit code after killing a container. Otherwise, a concurrent `stop` may not record the exit code and yield the container unusable. Fixes: containers#14761 Signed-off-by: Valentin Rothberg <[email protected]>
Make sure `Sync()` handles state transitions and exit codes correctly. The function was only being called when batching which could render containers in an unusable state when running concurrently with other state-altering functions/commands since the state must be re-read from the database before acting upon it. Fixes: containers#14761 Signed-off-by: Valentin Rothberg <[email protected]>
Started right after #14685, but please don't anybody get tunnel vision on it: correlation causation etc.
Seen root and rootless; fedora 35, 36, and ubuntu. So far,
podman-remote
only.Once it triggers, the entire system is unusable, podman-everything hangs, and tests die after the Cirrus timeout.
[sys] 101 podman stop - unlock while waiting for timeout
(Labeling
remote
, and waiting to see if stupid bot removes the tag. I'm betting it will.)The text was updated successfully, but these errors were encountered: