Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Netavark: userns: nc: bad address 'nc-server.dns.podman' #16272

Closed
edsantiago opened this issue Oct 24, 2022 · 17 comments · Fixed by #18342
Closed

Netavark: userns: nc: bad address 'nc-server.dns.podman' #16272

edsantiago opened this issue Oct 24, 2022 · 17 comments · Fixed by #18342
Labels
aardvark flakes Flakes from Continuous Integration locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.

Comments

@edsantiago
Copy link
Member

In podman int tests:

podman Netavark network works across user ns
...
$ podman [options] run --log-driver k8s-file --rm --net 051ae9d9bd --uidmap 0:1:4096 \
       quay.io/libpod/alpine:latest \
       sh -c echo podman | nc -w 1 nc-server.dns.podman 9480
nc: bad address 'nc-server.dns.podman'

Probably a race condition in test setup. Rare, but still triggering.

Podman run networking [It] podman Netavark network works across user ns

@edsantiago edsantiago added the flakes Flakes from Continuous Integration label Oct 24, 2022
@Luap99
Copy link
Member

Luap99 commented Oct 24, 2022

I think this is the same as #14173,
the dns tests never fails alone which means that some condition is causing aardvark to fail and keep it from working again.
We reworked the start-up so it should be better now? Do we know the netavark/aardvark version from the tests?

@edsantiago
Copy link
Member Author

edsantiago commented Oct 24, 2022

  • Click one of the timestamps in the latest failure (the 10-21 one)
    • Press Home key to go to top of log
      • Click the Task link
        • Press End key to go to bottom
          • Click the 'Run package_versions` accordion

Shows netavark-1.1.0-1.fc36-x86_64 (does not show aardvark; I'll open a PR to fix that)

[Edit: instructions above provided as reference because this isn't the last time we'll need to know package versions]

@Luap99
Copy link
Member

Luap99 commented Oct 24, 2022

Thanks, we keep aardvark and netavark in sync so it should be the same. Using netavark/aardvark v1.2 might fix this but honestly I don't know since the underlying cause is unknown.

@github-actions
Copy link

A friendly reminder that this issue had no activity for 30 days.

@MartinX3
Copy link

/remove stale

@github-actions
Copy link

A friendly reminder that this issue had no activity for 30 days.

@MartinX3
Copy link

/remove stale

@Luap99
Copy link
Member

Luap99 commented Jan 3, 2023

@edsantiago Any new instances in your log?

@Luap99 Luap99 added the aardvark label Jan 3, 2023
@edsantiago
Copy link
Member Author

@Luap99 I'm still working my way through the last few weeks of flakes. I'll try to answer later today.

@edsantiago
Copy link
Member Author

Never mind, I forgot about cirrus-flake-grep. It shows the last instances on October 20 and 21, so I'm declaring this fixed.

@edsantiago
Copy link
Member Author

Reopening. My disable-retries PR (#17831) shows that this is still happening even on main with updated CI VMs:

podman Netavark network works across user ns
...
# podman [options] network create f48f8e5b27
f48f8e5b27
# podman [options] run --log-driver k8s-file -d --name nc-server --net f48f8e5b27 quay.io/libpod/alpine:latest nc -l -p 9480
117a5425064b2bd6b8417e53c797b8a2271c8fcc2f5f3cfcef9cc89433b60e55
# podman [options] run --log-driver k8s-file --rm --net f48f8e5b27 --uidmap 0:1:4096 quay.io/libpod/alpine:latest sh -c echo podman | nc -w 1 nc-server.dns.podman 9480
nc: bad address 'nc-server.dns.podman'

We're probably not seeing it because it passes on retry.

@edsantiago edsantiago reopened this Apr 19, 2023
@edsantiago
Copy link
Member Author

Another one, remote f37 root

@edsantiago
Copy link
Member Author

And another one, f37 root not-remote

@Luap99
Copy link
Member

Luap99 commented Apr 19, 2023

Sure this was never really fixed. There is absolutely no guarantee that aardvark-dns is ready before the container is started.

@edsantiago
Copy link
Member Author

There is absolutely no guarantee that aardvark-dns is ready before the container is started.

This disturbs me. For tests, it's easy to add sleep(5) and sweep that under the rug, but I can envision a number of real-world scenarios in which users will want to do DNS queries early on. (In fact I do see a /remove stale from an outside party, suggesting there is at least one real-world person experiencing this issue).

Do we have plans to fix this?

@Luap99
Copy link
Member

Luap99 commented Apr 19, 2023

No plans so far. The big race condition was fixed. Now it is much harder to hit this issue that is why the flake went under your radar.

@edsantiago
Copy link
Member Author

FWIW, here's my motivation for cleaning this up in #18342. It's failing quite often:

  • debian-12 : int remote debian-12 root host sqlite [remote]
    • 04-24 18:10 in Podman run networking [It] podman Netavark network works across user ns
    • 04-24 12:44 in Podman run networking [It] podman Netavark network works across user ns
    • 04-20 21:34 in Podman run networking [It] podman Netavark network works across user ns
  • fedora-37 : int podman fedora-37 root container sqlite
    • 04-20 21:22 in Podman run networking [It] podman Netavark network works across user ns
    • 04-20 10:58 in Podman run networking [It] podman Netavark network works across user ns
  • fedora-37 : int remote fedora-37 root host sqlite [remote]
    • 04-24 18:09 in Podman run networking [It] podman Netavark network works across user ns
    • 04-19 17:37 in Podman run networking [It] podman network works across user ns

edsantiago added a commit to edsantiago/libpod that referenced this issue Apr 25, 2023
Nasty test flake, "bad address nc-server.dns.podman"

Cause: "There is absolutely no guarantee that aardvark-dns
is ready before the container is started." (source: Paul).

Workaround (not a real solution): wait before doing a host lookup.

Also: remove a 99%-duplicate test.

Closes: containers#16272   (I hope)

Signed-off-by: Ed Santiago <[email protected]>
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Aug 26, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 26, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
aardvark flakes Flakes from Continuous Integration locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants