-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
container wait: indicate timeout in error #18869
Conversation
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: vrothberg The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Not sure how this helps to debug #18860? It is obvious to me that we hit the timeout in the flake. Regardless this change LGTM, should give us some clarity without looking at the timings, especially if we start seeing these in the wild. |
I strongly suspect so as well but without the log I cannot be 100 percent sure. |
@edsantiago PTAL |
When waiting for a container, there may be a time window where conmon has already exited but the container hasn't been fully cleaned up. In that case, we give the container at most 20 seconds to be fully cleaned up. We cannot wait forever since conmon may have been killed or something else went wrong. After the timeout, we optimistically assume the container to be cleaned up and its exit code to present. If no exit code can be found, we return an error. Indicate in the error whether the timeout kicked in to help debug (transient) errors and flakes (e.g., containers#18860). [NO NEW TESTS NEEDED] Signed-off-by: Valentin Rothberg <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
LGTM2 |
ready to merge |
/lgtm |
When waiting for a container, there may be a time window where conmon has already exited but the container hasn't been fully cleaned up. In that case, we give the container at most 20 seconds to be fully cleaned up. We cannot wait forever since conmon may have been killed or something else went wrong.
After the timeout, we optimistically assume the container to be cleaned up and its exit code to present. If no exit code can be found, we return an error.
Indicate in the error whether the timeout kicked in to help debug (transient) errors and flakes (e.g., #18860).
[NO NEW TESTS NEEDED]
Does this PR introduce a user-facing change?