Skip to content
This repository has been archived by the owner on Oct 23, 2024. It is now read-only.

Server misbehaving #298

Closed
rafaelcapucho opened this issue Oct 4, 2015 · 17 comments
Closed

Server misbehaving #298

rafaelcapucho opened this issue Oct 4, 2015 · 17 comments

Comments

@rafaelcapucho
Copy link

Hello,

I setup Mesos-DNS in a linode and I prepend my 2 Mesos-DNS apps in resolv.conf:

domain members.linode.com
search members.linode.com
nameserver 72.14.179.25 # Atlas004
nameserver 198.58.99.72 # Atlas005
nameserver 23.239.24.5
nameserver 173.255.199.5
nameserver 66.228.53.5
nameserver 96.126.122.5
nameserver 96.126.124.5
nameserver 96.126.127.5
nameserver 198.58.107.5
nameserver 198.58.111.5
nameserver 72.14.179.5
nameserver 72.14.188.5
#options rotate

Its work good, dig working, resolv master.mesos properly but there're a lot of errors in Mesos Sandbox Log, a lot (per second) like that:

error exchanging with "23.239.24.5:53": dial udp: lookup dns.jxnc.cncnet.net. on 198.58.111.5:53: server misbehaving

4

The config used:

{
  "zk": "zk://atlas001:2181,atlas002:2181,atlas003:2181/mesos",
  "masters": ["atlas001:5050", "atlas002:5050", "atlas003:5050"],
  "refreshSeconds": 60,
  "ttl": 120,
  "domain": "mesos",
  "port": 53,
  "resolvers": ["173.255.199.5","66.228.53.5","96.126.122.5","96.126.124.5","96.126.127.5","198.58.107.5","198.58.111.5","23.239.24.5","72.14.179.5","72.14.188.5","8.8.8.8"],
  "timeout": 15,
  "email": "root.mesos-dns.mesos"
}

All IPs except the last are from linode DNS Server... I'm launching using marathon, with this code:

{
  "cmd": "/bin/mesos-dns -config=/mnt/mesos/sandbox/config.json",
  "cpus": 0.2,
  "mem": 50,
  "ports": [53],
  "id": "mesos-dns",
  "instances": 1,
  "container": {
    "type": "DOCKER",
    "docker": {
      "image": "nexusedge/docker-mesos-dns",
      "network": "HOST",
      "privileged": true
    }
  }, "uris":[
    "http://blog.rafaelcapucho.com/download/config.json"
  ],
  "constraints": [["rack_id", "LIKE", "atlas00[4-5]"],["hostname","UNIQUE"]]
}

Dockerfile:

FROM ubuntu:14.04
MAINTAINER Rafael Capucho <[email protected]>

RUN apt-get -y install wget gzip && wget https://github.com/mesosphere/mesos-dns/releases/download/v0.3.0/mesos-dns-v0.3.0-linux-amd64.gz -O /tmp/mesos-dns.gz

RUN gzip -d /tmp/mesos-dns.gz && rm -f /tmp/mesos-dns.gz && chmod +x /tmp/mesos-dns* && ln -s /tmp/mesos-dns* /bin/mesos-dns

EXPOSE 53
CMD ["mesos-dns"]

Thank you

@rafaelcapucho
Copy link
Author

Sounds like 23.239.24.5 and 198.58.111.5 Linode Servers are down, I removed from the list and the problem stopped.

Thank you

@rafaelcapucho
Copy link
Author

The solution I propose before don't solved the problem, after a while it start getting errors in stderr again, like that:

ERROR: 2015/10/04 19:37:38 exchanger.go:64: error exchanging with "173.255.199.5:53": dial udp: lookup ns.98.113.in-addr.arpa. on 173.255.199.5:53: no such host
ERROR: 2015/10/04 19:37:38 exchanger.go:64: error exchanging with "173.255.199.5:53": dial udp: lookup ns.98.113.in-addr.arpa. on 173.255.199.5:53: no such host
ERROR: 2015/10/04 19:37:38 exchanger.go:64: error exchanging with "173.255.199.5:53": dial udp: lookup ns.98.113.in-addr.arpa. on 173.255.199.5:53: no such host
ERROR: 2015/10/04 19:37:40 exchanger.go:64: error exchanging with "23.239.24.5:53": read udp 72.14.179.25:34364->202.114.64.5:53: i/o timeout
ERROR: 2015/10/04 19:37:40 exchanger.go:64: error exchanging with "8.8.8.8:53": read udp 72.14.179.25:39073->202.114.64.5:53: i/o timeout
ERROR: 2015/10/04 19:37:40 resolver.go:280: dns.whu.edu.cn.
ERROR: 2015/10/04 19:37:40 resolver.go:281: failed external DNS lookup of "dns.whu.edu.cn.": read udp 72.14.179.25:39073->202.114.64.5:53: i/o timeout
ERROR: 2015/10/04 19:37:40 exchanger.go:64: error exchanging with "173.255.199.5:53": read udp 72.14.179.25:39450->202.114.64.5:53: i/o timeout
ERROR: 2015/10/04 19:37:40 exchanger.go:64: error exchanging with "173.255.199.5:53": read udp 72.14.179.25:57648->202.114.64.5:53: i/o timeout
ERROR: 2015/10/04 19:37:40 exchanger.go:64: error exchanging with "198.58.107.5:53": read udp 72.14.179.25:55128->202.114.64.5:53: i/o timeout
ERROR: 2015/10/04 19:37:40 exchanger.go:64: error exchanging with "23.239.24.5:53": read udp 72.14.179.25:54867->202.114.64.5:53: i/o timeout
ERROR: 2015/10/04 19:37:40 exchanger.go:64: error exchanging with "72.14.179.5:53": read udp 72.14.179.25:40865->202.114.64.5:53: i/o timeout
ERROR: 2015/10/04 19:37:40 exchanger.go:64: error exchanging with "96.126.124.5:53": read udp 72.14.179.25:40180->202.114.64.5:53: i/o timeout
ERROR: 2015/10/04 19:37:40 exchanger.go:64: error exchanging with "96.126.124.5:53": read udp 72.14.179.25:36342->202.114.64.5:53: i/o timeout

And the file size is begin huge,
5

How could I solve it? Thank you.

@rafaelcapucho rafaelcapucho reopened this Oct 4, 2015
@tsenart
Copy link
Contributor

tsenart commented Oct 5, 2015

Hello @rafaelcapucho,

Would you please verify all the IPs you configured as the resolvers are reachable on port 53? I can't reach them from my machine.

Two remarks:

  • /etc/resolv.conf has an usual limit of 3 nameservers in Linux. All others will be ignored: http://linux.die.net/man/5/resolv.conf
  • I don't think there is a need for you to configure that many resolver IPs. Why don't you try just the public Google ones? ["8.8.8.8", 8.8.4.4"]

@rafaelcapucho
Copy link
Author

@tsenart Hello!,

Yes, I can ping all those DNS servers within my linodes.

I changed as you said,
Now my resolv.conf just have:

nameserver 72.14.179.25 # Atlas004
nameserver 198.58.99.72 # Atlas005
nameserver 8.8.8.8

And Mesos-DNS:

{
  "zk": "zk://atlas001:2181,atlas002:2181,atlas003:2181/mesos",
  "masters": ["atlas001:5050", "atlas002:5050", "atlas003:5050"],
  "refreshSeconds": 60,
  "ttl": 120,
  "domain": "mesos",
  "port": 53,
  "resolvers": ["8.8.8.8", "8.8.4.4"],
  "timeout": 30,
  "email": "root.mesos-dns.mesos"
}

But now i'm getting even bigger log volume, with errors like that:

ERROR: 2015/10/05 13:38:24 resolver.go:281: failed external DNS lookup of "hehehey.ru.": read udp 72.14.179.25:50739->8.8.4.4:53: i/o timeout
ERROR: 2015/10/05 13:38:24 exchanger.go:64: error exchanging with "8.8.4.4:53": read udp 72.14.179.25:33412->8.8.4.4:53: i/o timeout
ERROR: 2015/10/05 13:38:24 resolver.go:280: hehehey.ru.
ERROR: 2015/10/05 13:38:24 resolver.go:281: failed external DNS lookup of "hehehey.ru.": read udp 72.14.179.25:33412->8.8.4.4:53: i/o timeout
ERROR: 2015/10/05 13:38:24 exchanger.go:64: error exchanging with "8.8.8.8:53": EOF
ERROR: 2015/10/05 13:38:24 exchanger.go:64: error exchanging with "8.8.4.4:53": read udp 72.14.179.25:42109->8.8.4.4:53: i/o timeout
ERROR: 2015/10/05 13:38:24 resolver.go:280: hehehey.ru.
ERROR: 2015/10/05 13:38:24 resolver.go:281: failed external DNS lookup of "hehehey.ru.": read udp 72.14.179.25:42109->8.8.4.4:53: i/o timeout
ERROR: 2015/10/05 13:38:24 exchanger.go:64: error exchanging with "8.8.4.4:53": read udp 72.14.179.25:44076->8.8.4.4:53: i/o timeout
ERROR: 2015/10/05 13:38:24 resolver.go:280: hehehey.ru.
ERROR: 2015/10/05 13:38:24 resolver.go:281: failed external DNS lookup of "hehehey.ru.": read udp 72.14.179.25:44076->8.8.4.4:53: i/o timeout
ERROR: 2015/10/05 13:38:24 exchanger.go:64: error exchanging with "8.8.4.4:53": read udp 72.14.179.25:34793->8.8.4.4:53: i/o timeout
ERROR: 2015/10/05 13:38:24 resolver.go:280: hehehey.ru.
ERROR: 2015/10/05 13:38:24 resolver.go:281: failed external DNS lookup of "hehehey.ru.": read udp 72.14.179.25:34793->8.8.4.4:53: i/o timeout
ERROR: 2015/10/05 13:38:24 exchanger.go:64: error exchanging with "8.8.4.4:53": read udp 72.14.179.25:54166->8.8.4.4:53: i/o timeout
ERROR: 2015/10/05 13:38:24 resolver.go:280: hehehey.ru.

And other times like that:

ERROR: 2015/10/05 13:38:24 exchanger.go:64: error exchanging with "8.8.8.8:53": read udp 72.14.179.25:44366->8.8.8.8:53: i/o timeout
ERROR: 2015/10/05 13:38:24 exchanger.go:64: error exchanging with "8.8.8.8:53": read udp 72.14.179.25:33414->8.8.8.8:53: i/o timeout
ERROR: 2015/10/05 13:38:24 exchanger.go:64: error exchanging with "8.8.8.8:53": read udp 72.14.179.25:42670->8.8.8.8:53: i/o timeout
ERROR: 2015/10/05 13:38:24 exchanger.go:64: error exchanging with "8.8.8.8:53": read udp 72.14.179.25:41993->8.8.8.8:53: i/o timeout
ERROR: 2015/10/05 13:38:24 exchanger.go:64: error exchanging with "8.8.8.8:53": read udp 72.14.179.25:39557->8.8.8.8:53: i/o timeout
ERROR: 2015/10/05 13:38:24 exchanger.go:64: error exchanging with "8.8.8.8:53": read udp 72.14.179.25:48565->8.8.8.8:53: i/o timeout
ERROR: 2015/10/05 13:38:24 exchanger.go:64: error exchanging with "8.8.8.8:53": read udp 72.14.179.25:58714->8.8.8.8:53: i/o timeout

Thank you for your help, it is very important because we are almost at the end of our config to production cluster.

@rafaelcapucho
Copy link
Author

  1. I tested resolving external DNS using dig like:
dig external-site.com @72.14.179.25
dig another-external-site.com @8.8.8.8

Both worked.

  1. My Iptables is empty

@tsenart
Copy link
Contributor

tsenart commented Oct 5, 2015

Tonight I won't be able to spend time on this but it'll be my priority tomorrow. It might be related to #297.

@tsenart
Copy link
Contributor

tsenart commented Oct 5, 2015

@rafaelcapucho: From the timestamps of the logs you've posted, it seems that you have a hight rate of queries for the same names. I'm suspecting Google is rate limiting you: https://developers.google.com/speed/public-dns/docs/security#rate_limit

@rafaelcapucho
Copy link
Author

@tsenart: It make a lot of sense, but in the same time I was getting the same behaviour using Linode DNS and we don't know if they have rate limiting..

The most strange is that the set of 5 computers is empty, all it have is Mesos,Marathon,Docker,Mesos-DNS,1 flask app... running.

What could cause too much requests? (I dont even know what hehehey.ru is, actually)

And, after the first search on Google DNS we expect that the Host store the value in local memory (cache), right?

@tsenart
Copy link
Contributor

tsenart commented Oct 5, 2015

External queries aren't being cached in Mesos-DNS at this point. Try to trace where all of those queries are coming from.

@rafaelcapucho
Copy link
Author

Result of some miliseconds of running:

tcpdump -n -n port 53:
https://paste.ee/r/vsYjE

netstat -apn|grep -w 53:
1

72.14.184.154   atlas001
72.14.177.159   atlas002
45.33.4.187     atlas003
72.14.179.25    atlas004
198.58.99.72    atlas005

I don't know how to interpret most of this results but... And If I found a way to block with iptables to allow queries on :53 to my specifics IPs? Thank you

@rafaelcapucho
Copy link
Author

2

tsenart pushed a commit that referenced this issue Oct 6, 2015
While debugging #297 and #298, it became apparent that this extra
information would be very useful in aiding understanding of the issue.
tsenart pushed a commit that referenced this issue Oct 6, 2015
While debugging #297 and #298, it became apparent that this extra
information would be very useful in aiding understanding of the issue.
tsenart pushed a commit that referenced this issue Oct 6, 2015
While debugging #297 and #298, it became apparent that this extra
information would be very useful in aiding understanding of the issue.
@rafaelcapucho
Copy link
Author

@tsenart Hello,

When for some reason you guess that the problem is already solved and you would like to test in my environment just tell me. Specially if you provide a binary compiled for Ubuntu 64 bits.

Thank you very much for your effort!!!

@tsenart
Copy link
Contributor

tsenart commented Oct 7, 2015

@rafaelcapucho: I will, thanks for your patience.

@tsenart
Copy link
Contributor

tsenart commented Oct 7, 2015

Can you try running https://github.com/mesosphere/mesos-dns/releases/tag/v0.4.0-pre and see how it behaves?

@tobilg
Copy link

tobilg commented Oct 8, 2015

I have a similar problem with v0.3.0. Will test v0.4.0pre as well and give some feedback... Thanks!

@tobilg
Copy link

tobilg commented Oct 8, 2015

The error messages are go for my environment now. Still having an issue with forwarding to Marathon app ports though, but I will open another issue for that. Thanks for the fast fix.

@rafaelcapucho
Copy link
Author

The netstat -apn|grep -w 53 shows that it isn't flooded anymore,

Some logs msgs appears but are directly related to external requests to my DNS Server.

Thank you @tsenart by your effort and energy!,
Closing the Issue.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants