Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eighty-six eighty-eighty #11694

Merged

Conversation

edsantiago
Copy link
Member

(Sorry, couldn't resist).

CI flakes have been coming down - thank you to everyone who has
been making them a priority.

This leaves a noisy subset that I've just been ignoring for months:

Running: podman ... -p 8080:something
...cannot listen on the TCP port: listen tcp4 :8080: bind: address already in use

Sometimes these are one-time errors resolved on 2nd try; sometimes
they fail three times, forcing CI user to hit Rerun. In all cases
they make noise in my flake logs, which costs me time.

My assumption is that this has to do with ginkgo running random
tests in parallel. Since many e2e tests simplemindedly use 8080,
collisions are inevitable.

Solution: simplemindedly replace 8080 with other (also arbitrarily
picked) numbers. This is imperfect -- it requires human developers
to pick a number NNNN and 'grep NNNN test/e2e/*' before adding
new tests, which I am 100% confident ain't gonna happen -- but
it's better than what we have now.

Side note: I considered writing and using a RandomAvailablePort()
helper, but that would still be racy. Plus, it would be a pain
to interpolate strings into so many places. Finally, with this
hand-tooled approach, if/when we do get conflicts on port NNNN,
it should be very easy to grep for NNNN, find the offending tests
that reuse that port, and fix one of them.

Signed-off-by: Ed Santiago [email protected]

(Sorry, couldn't resist).

CI flakes have been coming down - thank you to everyone who has
been making them a priority.

This leaves a noisy subset that I've just been ignoring for months:

    Running: podman ... -p 8080:something
    ...cannot listen on the TCP port: listen tcp4 :8080: bind: address already in use

Sometimes these are one-time errors resolved on 2nd try; sometimes
they fail three times, forcing CI user to hit Rerun. In all cases
they make noise in my flake logs, which costs me time.

My assumption is that this has to do with ginkgo running random
tests in parallel. Since many e2e tests simplemindedly use 8080,
collisions are inevitable.

Solution: simplemindedly replace 8080 with other (also arbitrarily
picked) numbers. This is imperfect -- it requires human developers
to pick a number NNNN and 'grep NNNN test/e2e/*' before adding
new tests, which I am 100% confident ain't gonna happen -- but
it's better than what we have now.

Side note: I considered writing and using a RandomAvailablePort()
helper, but that would still be racy. Plus, it would be a pain
to interpolate strings into so many places. Finally, with this
hand-tooled approach, if/when we _do_ get conflicts on port NNNN,
it should be very easy to grep for NNNN, find the offending tests
that reuse that port, and fix one of them.

Signed-off-by: Ed Santiago <[email protected]>
@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 22, 2021
Copy link
Member

@Luap99 Luap99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@rhatdan
Copy link
Member

rhatdan commented Sep 22, 2021

/approve
/lgtm
/hold

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 22, 2021
@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Sep 22, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Sep 22, 2021

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: edsantiago, rhatdan

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@rhatdan
Copy link
Member

rhatdan commented Sep 22, 2021

/hold cancel

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 22, 2021
@rhatdan
Copy link
Member

rhatdan commented Sep 22, 2021

@edsantiago Does this close out any issues?

@openshift-merge-robot openshift-merge-robot merged commit 420ff1d into containers:main Sep 22, 2021
@edsantiago edsantiago deleted the prevent_port_collisions branch September 22, 2021 16:38
@edsantiago
Copy link
Member Author

@rhatdan no, I never filed issues for these flakes.

@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 22, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 22, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants