Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

increase timeout of on-failure=kill system test #16112

Closed
wants to merge 1 commit into from

Conversation

vrothberg
Copy link
Member

The on-failure=kill system tests turned out to be flaky. Once the container has been killed, the test waits for systemd to restart the service by running container inspect for 10 seconds. The subsequent healthcheck run was the flake point which suggests the 10 seconds timeout to not be sufficiently high enough; presumably when the CI nodes are under pressure.

Fixes: #16075
Signed-off-by: Valentin Rothberg [email protected]

Does this PR introduce a user-facing change?

None

The on-failure=kill system tests turned out to be flaky.
Once the container has been killed, the test waits for
systemd to restart the service by running `container inspect`
for 10 seconds.  The subsequent `healthcheck run` was the
flake point which suggests the 10 seconds timeout to not be
sufficiently high enough; presumably when the CI nodes are
under pressure.

Fixes: containers#16075
Signed-off-by: Valentin Rothberg <[email protected]>
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 11, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: vrothberg

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 11, 2022
@vrothberg
Copy link
Member Author

@edsantiago WDYT?

@edsantiago
Copy link
Member

I'm not usually a fan of bumping timeouts. Is there another check that can be done to at least verify intermediate progress?

I'm OOTO today, will not be able to look again, but will check first thing Wednesday morning.

@vrothberg
Copy link
Member Author

I'm not usually a fan of bumping timeouts. Is there another check that can be done to at least verify intermediate progress?

I tried coming up with an alternative approach but could not find one. What we are waiting for essentially is for the container to be recreated and started.

@rhatdan
Copy link
Member

rhatdan commented Oct 11, 2022

LGTM

@edsantiago
Copy link
Member

@vrothberg can we go with #16129, if you agree with my fix?

@vrothberg vrothberg closed this Oct 12, 2022
@vrothberg vrothberg deleted the fix-16075 branch October 12, 2022 12:41
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 20, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 20, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. release-note-none
Projects
None yet
Development

Successfully merging this pull request may close these issues.

healthcheck run: error: ctr does not exist in database: no such container
3 participants