Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

e2e networking test: better way to get host IP #18270

Merged

Conversation

edsantiago
Copy link
Member

uber/jaeger-client-go library is deprecated. Remove it.

Only place it's used is in one e2e test, a test that is flaking
in a way that suggests that the HostIP() weighting heuristic from
that module was not actually getting the best outgoing IP address.
So, switch to using what seems to be the current best practice.
No need to make it reusable, since it's only used in one place.

Oh, also remove undesired "-dt" from two "podman run"s. In one
it's harmless, in the other it would cause a test failure under
some circumstances.

Closes: #18269 (optimistic, aren't I?)

Signed-off-by: Ed Santiago [email protected]

None

uber/jaeger-client-go library is deprecated. Remove it.

Only place it's used is in one e2e test, a test that is flaking
in a way that suggests that the HostIP() weighting heuristic from
that module was not actually getting the best outgoing IP address.
So, switch to using what seems to be the current best practice.
No need to make it reusable, since it's only used in one place.

Oh, also remove undesired "-dt" from two "podman run"s. In one
it's harmless, in the other it would cause a test failure under
some circumstances.

Closes: containers#18269   (optimistic, aren't I?)

Signed-off-by: Ed Santiago <[email protected]>
@openshift-ci openshift-ci bot added release-note-none approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Apr 19, 2023
@edsantiago
Copy link
Member Author

Justification: jaeger is deprecated.

@edsantiago
Copy link
Member Author

Woot! On the very first CI run, on debian, the flaky test ran with a 10.128.0 address and passed! Obviously that doesn't mean the flake is fixed (I haven't sifted passing-CI logs for 10.128.0) but at least it means my PR hasn't caused any major breakage.

Copy link
Member

@giuseppe giuseppe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+7 −12,712 is impressive

LGTM

Copy link
Member

@Luap99 Luap99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The flake likely depends on the ordering of interfaces. The real reason why you see the wrong IP is because the podman bridge interfaces are NAT-ed to allow external communication for all containers without complex routing rules. So ncat sees the IP after NAT was applied which is the external ip and not the bridge ip thus the missmatch.

It is trivial to reproduce locally. Just set the outbound-addr to the podman bridge ip.
So yes this patch should definitely fix the flake is it always uses the external ip and not the bridge ip.

I also doubled check that the output-addr is not a NOP in this scenario, without it ncat shows the connection coming from 127.0.0.1 so it definitely tests what it should.

LGTM

@@ -576,15 +579,15 @@ EXPOSE 2004-2005/tcp`, ALPINE)

if strings.Contains(slirp4netnsHelp.OutputToString(), "outbound-addr") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should remove this branch logic?
outbound-addr was added in June 2020, https://github.com/rootless-containers/slirp4netns/releases/tag/v1.1.0
I doubt that we care to support versions that do not support it. RHEL 8 ships v1.2.

I know this isn't the purpose of your change and I definitely don't expect you to do it here.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Apr 20, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: edsantiago, giuseppe, Luap99

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [Luap99,edsantiago,giuseppe]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@Luap99
Copy link
Member

Luap99 commented Apr 20, 2023

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Apr 20, 2023
@openshift-merge-robot openshift-merge-robot merged commit e74a408 into containers:main Apr 20, 2023
@edsantiago edsantiago deleted the try_another_hostip branch April 20, 2023 11:09
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Aug 27, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 27, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. release-note-none
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[e2e] network bind to HostIP: unexpected source IP
4 participants