-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DNS Stops working after some time in bridge network #500
Comments
Is the aardvark-dns process still running when the dns stop working? Does |
Yes in both cases:
|
Recv-Q value seems way to high which seems to suggest we no longer read anything of the socket so some form of logic bug. If you drop the |
There is only one active:
|
Can you check what container uses This makes sense to me then, we listen async to either a incoming udp/tcp connection so it never processes two connection at the same time, so as long as the tcp connection doesn't send any data we just do not thing. That is most likely something we should fix. But it would be good to know where it hangs, can you run Also sounds like there is a |
10.89.0.23 is occupied by STALWART container:
I don't have GDB installed on the machine. I will install it, which requires machine restart (rpm-ostree) and come back with output after it "hangs" again. |
You can start a container with |
|
I've stopped stalwart container, and it immediately fixed DNS issues. Now, wondering why it hangs up after some time. It used to work flawless. |
Oh sorry I think you must make sure to use the exact same fedora version image (fedora:40) and then install gdb there so that the linker and such matches.
Large parts of arrdvark-dns where rewritten by me for 1.12, most importantly the aardvark-dns didn't even support tcp connections at all before. It is not clear to me why the tcp connection stays open, it may be our fault or the clients but either way we need to fix this in aardvark-dns because a single client should never be allowed to make the server non functional. |
I know how to reproduce the tcp hang myself so I do not need to full stack trace from you. I move the issue to the aardvark-dns repo as it is a bug there. |
Thank You very much for Your help. Let me know If I can assist in any way further. |
Right now for a single network all requests where processed serial and with tcp a caller is able to block us for a long time if it just opens the connection but sends very little or no data. To avoid this always spawn a new task if we accept a new tcp connection. We could do the same for udp however my testing with contrib/perf/run.sh has shown that it slows things down as the overhead of spawning a task is greater than the few quick simple map lookups so we only spawn where needed. We still have to spawn when forwarding external requests as this can take a long time. Fixes containers#500 Signed-off-by: Paul Holzinger <[email protected]>
Issue Description
The issue I have is that my podman containers stops resolving internal & external DNS after some time ~1h.
If I restart whole podman or reboot system I can resolve all of the dns records and ping between containers or external network.
After ~1h I can no longer resolve dns, or ping between containers, or outside network.
I'm running named bridge network.
Issue started to show up after upgrade from podman 5:5.1.2-1.fc40 -> 5:5.2.1-1.fc40 & maybe what's most important netavark 1.11.0-1.fc40 -> 2:1.12.1-1.fc40
aardvark-dns 1.11.0-1.fc40 -> 2:1.12.1-1.fc40
Steps to reproduce the issue
Steps to reproduce the issue
Describe the results you received
Right after container start:
After 1h
Journalctl contains following entries:
Describe the results you expected
Network working all the time
podman info output
Podman in a container
No
Privileged Or Rootless
Privileged
Upstream Latest Release
No
Additional environment details
Running Fedora IoT 40 latest
Running through compose
Additional information
Additional information like issue happens only occasionally or issue happens with a particular architecture or on a particular setting
The text was updated successfully, but these errors were encountered: