Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compose test: ongoing efforts to diagnose flakes #10031

Merged

Conversation

edsantiago
Copy link
Member

Yay, we got a failure with the new code (#10017). It shows
one ECONNRESET followed by a lot of ECONNREFUSED over an 8-second
period (actually 15s because of the second curl retry).

My hunch: the container itself is dying. No amount of retrying
will get anything to work. So, instead of the curl retry, if
curl fails, run 'docker-compose logs' and hope that it tells us
something useful when the test flakes again.

Also: DUH! Bitten by one of the most common bash pitfalls.
Checking exit status after 'local' will always be zero.
Split the declaration and the action into separate lines.

Also: if curl fails, return immediately. There's no point in
running the string output comparison.

Also: in _show_ok(), don't emit "actual/expect" messages
if both strings are empty.

Signed-off-by: Ed Santiago [email protected]

@openshift-ci-robot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: edsantiago

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 14, 2021
@edsantiago
Copy link
Member Author

The flaking test run, for posterity.

I tested the new code by hardcoding port=12345 just before the curl. It triggered as expected, gave some logs (successful, because the containers are of course working fine), and returned.

I suppose we just merge this and see what happens on a test failure. @Luap99, @baude, any other suggestions for what we can run in addition to / instead of docker-compose logs?

_show_ok 0 "$testname - curl failed with status $curl_rc"
_show_ok 0 "$testname - curl (port $port) failed with status $curl_rc"
# FIXME: is this useful? What else can we do to diagnose?
docker-compose logs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docker-compose logs should be good but can you add podman ps -a. It would be good to know if the container is still active. Also ss -tulpn should show us if the port is binded by the rootlesskit port forwarder.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thank you!

@edsantiago edsantiago force-pushed the compose_test_flakes_again branch from 2a041cd to 7e901ae Compare April 14, 2021 17:09
test/compose/test-compose Outdated Show resolved Hide resolved
Yay, we got a failure with the new code (containers#10017). It shows
one ECONNRESET followed by a lot of ECONNREFUSED over an 8-second
period (actually 15s because of the second curl retry).

My hunch: the container itself is dying. No amount of retrying
will get anything to work. So, instead of the curl retry, if
curl fails, run 'docker-compose logs', 'podman ps', and 'ss -tulpn'
and hope that one/more of those tells us something useful when
the test flakes again.

Also: DUH! Bitten by one of the most common bash pitfalls.
Checking exit status after 'local' will always be zero.
Split the declaration and the action into separate lines.

Also: if curl fails, return immediately. There's no point in
running the string output comparison.

Also: in _show_ok(), don't emit "actual/expect" messages
if both strings are empty.

Signed-off-by: Ed Santiago <[email protected]>
@edsantiago edsantiago force-pushed the compose_test_flakes_again branch from 7e901ae to 1cf2b3e Compare April 14, 2021 18:32
Copy link
Member

@Luap99 Luap99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@rhatdan
Copy link
Member

rhatdan commented Apr 14, 2021

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Apr 14, 2021
@openshift-merge-robot openshift-merge-robot merged commit df6c7c2 into containers:master Apr 14, 2021
@edsantiago edsantiago deleted the compose_test_flakes_again branch April 15, 2021 11:12
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 23, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 23, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants