-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Federation errors when DNS server is on same IP #8118
Comments
Is this a setup that was previously working and recently broke or is it a new setup? Asking in #synapse:matrix.org might be a good place to start, it sounds like something is broken in the docker config -- it could be interesting to modify the docker file to simply run a DNS lookup and see if that works. This sounds somewhat like #5846, but you do mention it isn't link-local, but maybe it runs into the same issue eventually? |
I recently changed the DNS setup and it appears that's when synapse broken (once restarted for the recent version update). So I'm sure it's related to changing the DNS servers. Setting the DNS servers to a different one on the container with Sadly I can't seem to run anything inside the docker image that does dns lookups since it's a pretty slim image. |
I have tried the script from the twisted bug on the host machine and everything seems fine there with any of the DNS IPs the container should possibly be getting, so it must be something in combination with the docker network - though I know for a fact other containers have no issue accessing services running on the same machine via the local network IP. |
I've tried the twisted snipped via python repl in docker exec and it seems fine for the IPv4 address, even though explicitly defining it on the container didn't. It fails for the IPv6 address with "can't assign requested address" - possibly the docker network not having full ipv6. However I'm sure this is unrelated to IPv6 given that resolv.conf only had IPv4 addresses in it and it failed for domains that only resolved to A records (and/or had a SRV record that pointed to something with an A record). The other case I managed to get it to actually time out was when using the local docker network IP. |
Hey, sorry it took some time to get back to you. Would you be able to turn Synapse's logging to |
With debug logging it just seems slightly more verbose, but I don't see anything that is useful in addition to the basic issue:
Interestingly it seems like some URLs are resolvable, since it does a bunch of .well-known lookups which look ok before it reached the logs above:
|
Between the .well-known lookups above and the first DNS errors there is one more federation log:
Other than that there's just
That I don't know exactly what it means. |
I've enabled query logging on my local DNS server and it seems like synapse looks up the domains with the DNS server every time it tries to connect to something, no DNS cache at all? Either way, I only see successful queries there. Some of the ones where DNS lookup fail that I've randomly chosen don't even show up in there. However, the resolver config for the container also allows requests to my router, so that may be why some don't show up. |
Interesting that Looking at the source of |
There are .well-known lookups failing though? The resolv.conf inside the container looks fine and as mentioned earlier I did run the twisted.names resolver snippet from the other issue inside tge container and it worked fine. |
oh I see, sorry. The only thing I can really think of is that DNS lookups are happening slowly enough that a backlog is building up and they are then timing out. (UDP lookups might be failing, causing a fallback to TCP?) I wonder if a tcpdump on the traffic from the docker container would be informative. |
I got around the slow DNS queries by increasing dnsmasq cache size to be greater than the number of failing DNS requests in my log (the latter was around 2000). I also increased the number of simultaneous connections to 300 from the 150 default, as I sometimes saw that issue popping out in dnsmasq logs. However, the root cause might have been that my server was sending a lot of data on start-up, which was increasing latency, making it hard to catch-up. (#8338). If this stems from the same root cause, you might have luck increasing this timeout, without necessarily having to change your DNS setup:
(I doubled it to 120s). |
Increasing the DNS cache size on the DNS side seems to work around this, indeed. |
sounds like a problem on the dns server side then. |
Description
I'm running the DNS server that synapse is using to resolve names on the same machine as synapse itself. As such, the IPv4 and IPv6 addresses (the IPv6 address is not link local, and I don't even see it in the resolv.conf inside the docker container) are identical to the ones synapse would have in the network.
Steps to reproduce
All HTTP requests by synapse fail due to not being able to resolve any domain names. nslookup on the parent machine is fine. I'm quite certain the STR aren't fully reduced, however I'd rather have a working synapse than try to find the exact case that breaks it.
Version information
If not matrix.org:
Version: 1.19.0
Install method: docker
The text was updated successfully, but these errors were encountered: