-
-
Notifications
You must be signed in to change notification settings - Fork 14.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mdns resolution only works on the second try #146406
Comments
uggggggh ok, now that I have taken a closer look at the tcpdump output... The very first request that happens is a PTR request to the upstream dns server(?) with the ip address of the ap(???). So, it knows the address already? |
I replaced
tcpdump (form the laptop):
So, pretty much the same as with |
Restarting |
snippets from A request no nscd:
Another request (times out):
Reading
NSS stuff starts:
DNS query:
More NSS modules:
And finally a failure. So, IIUC, all these NSS failures are expected on NixOS, as all those modules are not installed to standard locations where NSS can find them. The modules are supposed to be loaded by nscd, and the real problem here is that it times out. |
And here are snippets form the second resolution attemp:
And success. So, it looks like, on the second try, it likes nscd’s response to the first request much more, and proceeds with a completely different code path. |
Debug logs from nscd: Resolving google.com:
Resolving google.com (again):
Resolving lechat.local (first try):
Resolving lechat.local (second try):
So, the logs are exactly the same... However the resolution process is definitely different. |
Here is a fun fact: glibc makes no distinction between nscd being unavilable/unusable and a timeout. Another fun fact is that there is absolutely no documentation on nscd, oh well. So what happens in strace, is this:
|
Here is a weird thing: if I try to resolve a name for a device that does not exist in the network, I do see an ordinary mdns name resolution query as the first thing. (Then it returns nothing, the nscd socket read times out, and the same thing happens: NSS, error). So, it looks like the mdns module somehow knows that this device exists, so it does not try to resolve its name with an |
The latest news is that I replaced umdns with avahi on the router and it all... seems to work somehow? Although there are still no mdns queries whatsoever :/. |
Alright, I think the issue is indeed the of the PTR record. And the reason I did not see any queries is that a lot of stuff gets cached in Avahi. So, I shouldn’t have focused on nscd so much, but rater on Avahi.
returns immediately when Avahi is running on the router, since it publishes its PTR record, which gets cached by Avahi on the laptop. With My guess is that, for some reason, when glibc asks nscd to resolve a name, nscd goes ahead and also looks for a PTR record, which times out in Avahi. This is a bit weird, since I never asked for the reverse mapping and now I end up with an unresolved name, which could have been resolved easily. I don’t think the mDNS RFC requires responding to PTR records, so I would say it is a bug either in nscd or in the mdns nss module? |
omg, here is another completely crazy fact: If, when calling |
Yes, ok, the I suspect, if I report a bug against glibc, their response will be that nscd is just a cache. glibc does the right thing by giving up on waiting for it and going ahead and resolving the stuff itself – and the fact that on NixOS the NSS configuration does not match for ordinary binaries and nscd is no their problem :(. |
@kirelagin it's even crazier. nscd can't really "not cache" (see #135888 (comment)), and client code uses If there recently was a forward lookup, chances are this finds the reverse lookup. And that's why GETAI isn't called in these cases. |
Let's close this, this shouldn't be an issue with nsncd anymore, to which nixpkgs master just switched by default. Please reopen if this is still an issue, or you need to stick with glinc-nscd for some reason. |
Ok, so I have been living with this issue for a very long time now, I think I am finally fed up enough to start looking into it.
My setup:
umdns
(this does not really matter) and publishing its mdns name aslechat.local
services.avahi.nssmdns = true;
Here is what I see on my laptop:
So, the symptom is: every time it tries to resolve an mdns name, it fails with “System error”, then if I retry immediately, it works. It will keep working as long as I retry within a certain period of time (no idea, a minute? maybe a couple of minutes?).
Here is what I’ve got so far.
I am pretty sure that the dns-sd is setup correctly outside of my laptop name resolution, i.e. Avahi sees everything:
and, in general, in
tcpdump
I see the ap correctly responding to mdns resolution requests with its name.Generated
nsswitch.conf
:The weird thing is that I don’t see any mdns requests at all whenever I run either of the
ping
commands. However, I do see a regular dns request which takes some time and, I think, “System error” is printed once it fails:To sum up, currently:
mdns_minimal
just returns nothing straight away and the resolution proceeds todns
.dns
to return nothing and then fails with “System error”.mdns
and is somehow able to resolve the name.The text was updated successfully, but these errors were encountered: