Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Autoclustering broke in master #1013

Closed
errordeveloper opened this issue Feb 23, 2016 · 19 comments
Closed

Autoclustering broke in master #1013

errordeveloper opened this issue Feb 23, 2016 · 19 comments
Assignees
Labels
bug Broken end user or developer functionality; not working as the developers intended it
Milestone

Comments

@errordeveloper
Copy link
Contributor

Using df2c21e.

I have 7 nodes, and 24 A record in DNS:

weave dns-lookup scope
10.17.0.75
10.17.0.78
10.32.0.1
10.17.0.77
10.44.0.0
10.40.0.1
10.17.0.81
10.40.0.1
159.203.67.181
159.203.101.140
159.203.121.136
10.36.0.3
10.38.0.1
10.17.0.76
10.34.0.1
159.203.113.192
10.46.0.0
159.203.116.168
159.203.117.174
10.17.0.79
159.203.67.181
10.17.0.80
104.236.254.144
10.17.0.75

I'm also seeing a bunch of WebSocket errors:

<app> INFO: 2016/02/23 16:30:12.510322 app starting, version df2c21e, ID 17f9bd301e68cf61
<probe> INFO: 2016/02/23 16:30:12.515089 probe starting, version df2c21e, ID 50b26fc69b6e7fee
<probe> INFO: 2016/02/23 16:30:12.515160 publishing to: localhost:4040
<app> INFO: 2016/02/23 16:30:12.518905 listening on :4040
<probe> INFO: 2016/02/23 16:30:12.529847 Control connection to 127.0.0.1:4040 starting
<probe> INFO: 2016/02/23 16:30:12.544301 docker container: collecting stats for 9b7199f981db026bed50512084b57a06ec5d9dd9bcd52077740b446477147552
<probe> INFO: 2016/02/23 16:30:12.546494 docker container: collecting stats for c0a37dda35b63143049e83d5797cf2392b834dd896444d610220063776af6f69
<probe> INFO: 2016/02/23 16:30:12.549708 docker container: collecting stats for 37aa24a556afd32970194f20b37de04f675fc96960ce6e5a1e29b92af5616348
<probe> INFO: 2016/02/23 16:30:12.550000 Publish loop for 127.0.0.1:4040 starting
<probe> INFO: 2016/02/23 16:30:12.551179 docker container: collecting stats for 888fc5936e00aa25b1449d6f9d5434456d46d8757a90ee6d2dc1ddc02e3a740d
<probe> INFO: 2016/02/23 16:30:12.552962 docker container: collecting stats for cec1a646c6859d4d3211724eef02dc02b70f3ab2a157cc6d13c3c1244af71103
<probe> INFO: 2016/02/23 16:30:12.554731 docker container: collecting stats for beeef7228b10b3152853f4c97506efddff5b74bd718a43d92253986ee0380826
<probe> INFO: 2016/02/23 16:30:12.558779 docker container: collecting stats for f2884f183cace00aa2431a4c089ce624ff01da986b91ca0e069a1320826a697b
<probe> INFO: 2016/02/23 16:30:12.561428 docker container: collecting stats for 1d357220874a9a6507fe066ef2db3215bd07403e2213b43d87b282ad999c1e48
<probe> INFO: 2016/02/23 16:30:12.569743 docker container: collecting stats for db935fe44b3d93587c2592459b00d74f9f221bd8b708147ce3ab7aca0135bf3a
<probe> INFO: 2016/02/23 16:30:12.574016 docker container: collecting stats for 87dd47dbcb6ac84dafe66eb32aa9e243d0501707b38c35126825973ccb3e4f04
<app> INFO: 2016/02/23 16:30:12.583393 Success updating weaveDNS
<probe> INFO: 2016/02/23 16:30:12.721139 docker container: collecting stats for 4ac2c5b62706d16ecb0aab74c2196016f13424a294f77226bb2fab7eff08c109
<probe> ERRO: 2016/02/23 16:30:12.735247 Error checking version: Unknown status: 400
<probe> INFO: 2016/02/23 16:30:12.880065 Success collecting weave info
<probe> ERRO: 2016/02/23 16:30:12.887196 docker container: error reading event, did container stop? read unix @->/var/run/docker.sock: use of closed network connection
<probe> INFO: 2016/02/23 16:30:12.887319 docker container: stopped collecting stats for 4ac2c5b62706d16ecb0aab74c2196016f13424a294f77226bb2fab7eff08c109
<app> INFO: 2016/02/23 16:31:13.394586 err: websocket: close 1005 
<app> INFO: 2016/02/23 16:31:27.412467 err: websocket: close 1005 
<app> INFO: 2016/02/23 16:31:28.905733 err: websocket: close 1005 
<app> INFO: 2016/02/23 16:31:29.725439 err: websocket: close 1005 
<app> INFO: 2016/02/23 16:31:34.176992 err: websocket: close 1005 
<app> INFO: 2016/02/23 16:31:59.308055 err: websocket: close 1001 
<app> INFO: 2016/02/23 16:32:04.233627 err: websocket: close 1001 
<app> INFO: 2016/02/23 16:32:09.288416 err: websocket: close 1005 
<app> INFO: 2016/02/23 16:32:10.495563 err: websocket: close 1005 
<app> INFO: 2016/02/23 16:32:11.351299 err: websocket: close 1005 
<probe> INFO: 2016/02/23 16:33:45.765915 docker container: collecting stats for 4e0d81f006a42d294a3aaea31edfc169cd775242a9efe02f1da478ea47efa81a
<probe> ERRO: 2016/02/23 16:33:46.232719 docker container: error reading event, did container stop? read unix @->/var/run/docker.sock: use of closed network connection
<probe> INFO: 2016/02/23 16:33:46.232757 docker container: stopped collecting stats for 4e0d81f006a42d294a3aaea31edfc169cd775242a9efe02f1da478ea47efa81a
<probe> INFO: 2016/02/23 16:33:53.902385 docker container: collecting stats for 42032fbda7b38762e628a32a3fab793144156a61e3cc7c21fe0f060ba0fec853
<probe> ERRO: 2016/02/23 16:33:54.065129 docker container: error reading event, did container stop? read unix @->/var/run/docker.sock: use of closed network connection
<probe> INFO: 2016/02/23 16:33:54.065171 docker container: stopped collecting stats for 42032fbda7b38762e628a32a3fab793144156a61e3cc7c21fe0f060ba0fec853
<probe> INFO: 2016/02/23 16:34:04.654315 docker container: collecting stats for 4611039db6fabf719017856c38e19c2931188de9a576bc6cf082ed9a0eb8d07d
<probe> ERRO: 2016/02/23 16:34:04.781010 docker container: error reading event, did container stop? read unix @->/var/run/docker.sock: use of closed network connection
<probe> INFO: 2016/02/23 16:34:04.781036 docker container: stopped collecting stats for 4611039db6fabf719017856c38e19c2931188de9a576bc6cf082ed9a0eb8d07d
<probe> INFO: 2016/02/23 16:34:10.502724 docker container: collecting stats for 74e4a9ce192b6627e85d7ccd0ccfce5c3d7db661365374234bc1c3dc15b4975f
<probe> INFO: 2016/02/23 16:34:10.646155 docker container: stopped collecting stats for 74e4a9ce192b6627e85d7ccd0ccfce5c3d7db661365374234bc1c3dc15b4975f
<app> INFO: 2016/02/23 16:34:41.379839 err: websocket: close 1005 
<app> INFO: 2016/02/23 16:34:46.124803 err: websocket: close 1005 
<app> INFO: 2016/02/23 16:34:49.204918 err: websocket: close 1005 
<probe> INFO: 2016/02/23 16:34:57.460021 docker container: collecting stats for a047b28d99c92b710b03b490f20a0a06301b94704ffa01192df427dc1036597a
<probe> ERRO: 2016/02/23 16:34:57.774014 docker container: error reading event, did container stop? read unix @->/var/run/docker.sock: use of closed network connection
<probe> INFO: 2016/02/23 16:34:57.774066 docker container: stopped collecting stats for a047b28d99c92b710b03b490f20a0a06301b94704ffa01192df427dc1036597a
@errordeveloper
Copy link
Contributor Author

Just to be clear, I only see one host in each of the apps.

@errordeveloper errordeveloper added the bug Broken end user or developer functionality; not working as the developers intended it label Feb 23, 2016
@errordeveloper
Copy link
Contributor Author

I did have 0.12 running before this and ran this on each host to upgrade:

./scope stop
curl -sL https://raw.githubusercontent.com/weaveworks/scope/master/scope > ./scope
./scope launch

@tomwilkie tomwilkie added this to the 0.13.0 milestone Feb 24, 2016
@2opremio
Copy link
Contributor

After looking at https://github.com/weaveworks/scope/pull/867/files#diff-79e626e243584a7a4f65f233eca99889R59 I would say the problem is #867 broke search domains.

Instead of using the package-level client I think we should use ClientConfigFromFile() initially and then modify the config if we detect the weave network.

@2opremio
Copy link
Contributor

Actually, we use a (hardcoded) FQDN (scope.weave.works). @errordeveloper are you by any chance using a non-default Weave domain?

@paulbellamy
Copy link
Contributor

you probably mean scope.weave.local (not scope.weave.works)

@paulbellamy
Copy link
Contributor

Asked @errordeveloper in person who said he is not using a custom weave domain.

@paulbellamy
Copy link
Contributor

cannot reproduce...

@2opremio
Copy link
Contributor

you probably mean scope.weave.local (not scope.weave.works)

Yep, sorry.

@errordeveloper Can you come up with an self-contained repro we can use?

@errordeveloper
Copy link
Contributor Author

Step 1: create 7 DigitaOcean machines (need to export DIGITALOCEAN_ACCESS_TOKEN first)

#!/bin/bash -xe

vm_names=$(seq -f 'test-%g' 1 7)

install_weave=" \
  sudo curl --silent --location http://git.io/weave --output /usr/local/bin/weave ; \
  sudo chmod +x /usr/local/bin/weave ; \
  /usr/local/bin/weave launch --init-peer-count 7 ; \
"

install_scope=" \
  sudo curl --silent --location http://git.io/scope --output /usr/local/bin/scope ; \
  sudo chmod +x /usr/local/bin/scope ; \
  /usr/local/bin/scope launch ; \
"

for m in $vm_names ; do
  docker-machine create --driver digitalocean ${m}
  docker-machine ssh ${m} "${install_weave}"
  docker-machine ssh ${m} "${install_scope}"
done

for m in $vm_names ; do
  docker-machine ssh ${m} "/usr/local/bin/weave connect $(docker-machine ip 'test-1')"
done

Step 3: access the app on any of the hosts, make sure there 7 nodes in the hosts view
Step 4: upgrade to latest build of Scope

#!/bin/bash -xe

vm_names=$(seq -f 'test-%g' 1 7)

upgrade_scope=" \
  /usr/local/bin/scope stop ; \
  sudo curl --silent --location https://raw.githubusercontent.com/weaveworks/scope/master/scope --output /usr/local/bin/scope ; \
  sudo chmod +x /usr/local/bin/scope ; \
  /usr/local/bin/scope launch ; \
"

for m in $vm_names ; do
  docker-machine ssh ${m} "${upgrade_scope}"
done

@errordeveloper
Copy link
Contributor Author

Looks like 5 hosts is already enough to break it... will try 4 now.

@errordeveloper
Copy link
Contributor Author

Can someone look into the implementation details, may be underlying DNS library doesn't handle too many records? We do have 3 records for each host, you know...

@errordeveloper
Copy link
Contributor Author

Looks like it breaks with 5 hosts.

@errordeveloper
Copy link
Contributor Author

With 0.12 I see this:

root@node-4:~# grep Publish scope.log
<probe> 2016/02/25 15:02:01 Publish loop for 127.0.0.1:4040 starting
<probe> 2016/02/25 15:02:07 Publish loop for 10.17.0.122:4040 starting # node-3 via weave
<probe> 2016/02/25 15:02:07 Publish loop for 104.236.54.23:4040 starting # node-2 via eth0
<probe> 2016/02/25 15:02:07 Publish loop for 104.236.212.38:4040 starting # node-3 via eth0
<probe> 2016/02/25 15:02:07 Publish loop for 104.131.33.221:4040 starting # node-1 via eth0
<probe> 2016/02/25 15:02:19 Publish loop for 45.55.139.9:4040 starting # node-5 via eth0
root@node-4:~# weave dns-lookup scope.weave.local
104.131.33.221
10.42.170.171
104.236.242.211
104.236.54.23
10.17.0.124
10.17.0.122
10.41.85.86
10.17.0.123
10.45.85.85
10.32.0.1
45.55.139.9
10.17.0.120
10.37.85.86
10.17.0.121
104.236.212.38
root@node-4:~# host scope.weave.local 172.17.0.1
;; Truncated, retrying in TCP mode.
Using domain server:
Name: 172.17.0.1
Address: 172.17.0.1#53
Aliases: 

scope.weave.local has address 10.17.0.122
scope.weave.local has address 10.37.85.86
scope.weave.local has address 10.17.0.124
scope.weave.local has address 104.236.54.23
scope.weave.local has address 45.55.139.9
scope.weave.local has address 10.41.85.86
scope.weave.local has address 104.236.242.211
scope.weave.local has address 10.17.0.123
scope.weave.local has address 10.45.85.85
scope.weave.local has address 10.42.170.171
scope.weave.local has address 104.236.212.38
scope.weave.local has address 104.131.33.221
scope.weave.local has address 10.32.0.1
scope.weave.local has address 10.17.0.120
scope.weave.local has address 10.17.0.121
Host scope.weave.local not found: 3(NXDOMAIN)
Host scope.weave.local not found: 3(NXDOMAIN)
docker-machine ls
NAME     ACTIVE   DRIVER         STATE     URL                          SWARM   DOCKER    ERRORS
node-1   -        digitalocean   Running   tcp://104.131.33.221:2376            v1.10.2   
node-2   -        digitalocean   Running   tcp://104.236.54.23:2376             v1.10.2   
node-3   -        digitalocean   Running   tcp://104.236.212.38:2376            v1.10.2   
node-4   -        digitalocean   Running   tcp://104.236.242.211:2376           v1.10.2   
node-5   -        digitalocean   Running   tcp://45.55.139.9:2376               v1.10.2   
root@node-4:~# weave ps
weave:expose d6:26:58:f0:11:33 10.45.85.85/12
root@node-4:~# ifconfig
datapath  Link encap:Ethernet  HWaddr 6a:3f:35:8f:ed:0a  
          inet6 addr: fe80::683f:35ff:fe8f:ed0a/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1410  Metric:1
          RX packets:21 errors:0 dropped:0 overruns:0 frame:0
          TX packets:8 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:1374 (1.3 KB)  TX bytes:648 (648.0 B)

docker0   Link encap:Ethernet  HWaddr 02:42:07:ca:7b:2b  
          inet addr:172.17.0.1  Bcast:0.0.0.0  Mask:255.255.0.0
          UP BROADCAST MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

eth0      Link encap:Ethernet  HWaddr 04:01:ae:af:97:01  
          inet addr:104.236.242.211  Bcast:104.236.255.255  Mask:255.255.192.0
          inet6 addr: fe80::601:aeff:feaf:9701/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:33463 errors:0 dropped:0 overruns:0 frame:0
          TX packets:29386 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:112940486 (112.9 MB)  TX bytes:17855598 (17.8 MB)

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:10726 errors:0 dropped:0 overruns:0 frame:0
          TX packets:10726 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:19830410 (19.8 MB)  TX bytes:19830410 (19.8 MB)

vethwe-bridge Link encap:Ethernet  HWaddr c6:7d:89:38:26:5d  
          inet6 addr: fe80::c47d:89ff:fe38:265d/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1410  Metric:1
          RX packets:5544 errors:0 dropped:0 overruns:0 frame:0
          TX packets:5543 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:3573252 (3.5 MB)  TX bytes:508157 (508.1 KB)

vethwe-datapath Link encap:Ethernet  HWaddr f2:a8:46:36:5f:33  
          inet6 addr: fe80::f0a8:46ff:fe36:5f33/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1410  Metric:1
          RX packets:5543 errors:0 dropped:0 overruns:0 frame:0
          TX packets:5544 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:508157 (508.1 KB)  TX bytes:3573252 (3.5 MB)

weave     Link encap:Ethernet  HWaddr d6:26:58:f0:11:33  
          inet addr:10.45.85.85  Bcast:0.0.0.0  Mask:255.240.0.0
          inet6 addr: fe80::d426:58ff:fef0:1133/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1410  Metric:1
          RX packets:5542 errors:0 dropped:0 overruns:0 frame:0
          TX packets:5535 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:3495496 (3.4 MB)  TX bytes:507509 (507.5 KB)

@errordeveloper
Copy link
Contributor Author

Also, weave report for node-4 may be of use.

@2opremio
Copy link
Contributor

That's already wrong. @errordeveloper has 5 nodes and the probe is reporting to 6 nodes, mixing up public interface IPs (104.* and 45.*) with a weave IP (10.17.0.122)

@errordeveloper
Copy link
Contributor Author

This appears to be quite likely to do with the upgrade sequence, as shown in #1013 (comment).

Once 0.12 is running, here is what the logs look like:

Exposing host to weave network.
10.45.85.85
Weave container detected at 127.0.0.1, Docker bridge at 172.17.0.1
<app> 2016/02/25 15:02:00 app starting, version 0.12.0, ID 537f662801a929d9
<app> 2016/02/25 15:02:00 listening on :4040
<probe> 2016/02/25 15:02:00 probe starting, version 0.12.0, ID 6869d818a99414b7
<probe> 2016/02/25 15:02:00 publishing to: localhost:4040, scope.weave.local:4040
<probe> 2016/02/25 15:02:01 Control connection to 127.0.0.1:4040 starting
<probe> 2016/02/25 15:02:01 Publish loop for 127.0.0.1:4040 starting
<probe> 2016/02/25 15:02:01 docker container: collecting stats for bb46a5d11641fb38f7ba70b985c60f2fd328044c5d14837cfc2320ef34c8f744
<probe> 2016/02/25 15:02:01 docker container: collecting stats for ea1a502b7566710abdfcaf988a850d68274b85557dc8934cf5f12a03939bcb77
<probe> 2016/02/25 15:02:01 docker container: collecting stats for a04912ef9726a656b764bb3d6831f827a7773cb36e4a9c3481fef09c5bada20c
<probe> 2016/02/25 15:02:01 docker container: collecting stats for c90bb027f76b34125fb3225ac5650356dfaf34c8b667b265f530569da8ae4b7a
<probe> 2016/02/25 15:02:04 Error fetching app details: Get http://10.17.0.122:4040/api: dial tcp 10.17.0.122:4040: getsockopt: no route to host
<probe> 2016/02/25 15:02:04 Error fetching app details: Get http://10.17.0.120:4040/api: dial tcp 10.17.0.120:4040: getsockopt: no route to host
<probe> 2016/02/25 15:02:04 Error fetching app details: Get http://10.17.0.121:4040/api: dial tcp 10.17.0.121:4040: getsockopt: no route to host
<probe> 2016/02/25 15:02:04 Control connection to 10.17.0.122:4040 starting
<probe> 2016/02/25 15:02:04 Control connection to 104.236.54.23:4040 starting
<probe> 2016/02/25 15:02:04 Control connection to 104.236.212.38:4040 starting
<probe> 2016/02/25 15:02:04 Control connection to 104.131.33.221:4040 starting
<probe> 2016/02/25 15:02:07 Publish loop for 10.17.0.122:4040 starting
<probe> 2016/02/25 15:02:07 Publish loop for 104.236.54.23:4040 starting
<probe> 2016/02/25 15:02:07 Publish loop for 104.236.212.38:4040 starting
<probe> 2016/02/25 15:02:07 Publish loop for 104.131.33.221:4040 starting
<probe> 2016/02/25 15:02:07 Error doing controls for 10.17.0.122:4040, backing off 1s: dial tcp 10.17.0.122:4040: getsockopt: no route to host
<probe> 2016/02/25 15:02:10 Error doing controls for 10.17.0.122:4040, backing off 2s: dial tcp 10.17.0.122:4040: getsockopt: no route to host
<probe> 2016/02/25 15:02:13 Error doing controls for 10.17.0.122:4040, backing off 4s: dial tcp 10.17.0.122:4040: getsockopt: no route to host
<probe> 2016/02/25 15:02:13 Error doing publish for 10.17.0.122:4040, backing off 1s: Post http://10.17.0.122:4040/api/report: dial tcp 10.17.0.122:4040: getsockopt: no route to host
<probe> 2016/02/25 15:02:16 Error fetching app details: Get http://10.17.0.122:4040/api: dial tcp 10.17.0.122:4040: getsockopt: no route to host
<probe> 2016/02/25 15:02:17 Error fetching app details: Get http://10.17.0.120:4040/api: dial tcp 10.17.0.120:4040: getsockopt: no route to host
<probe> 2016/02/25 15:02:17 Error fetching app details: Get http://10.17.0.121:4040/api: dial tcp 10.17.0.121:4040: getsockopt: no route to host
<probe> 2016/02/25 15:02:17 Error fetching app details: Get http://10.17.0.124:4040/api: dial tcp 10.17.0.124:4040: getsockopt: no route to host
<probe> 2016/02/25 15:02:17 Control connection to 45.55.139.9:4040 starting
<probe> 2016/02/25 15:02:19 Publish loop for 45.55.139.9:4040 starting
<probe> 2016/02/25 15:02:19 Error doing controls for 10.17.0.122:4040, backing off 8s: dial tcp 10.17.0.122:4040: getsockopt: no route to host
<probe> 2016/02/25 15:02:19 Error doing publish for 10.17.0.122:4040, backing off 2s: Post http://10.17.0.122:4040/api/report: dial tcp 10.17.0.122:4040: getsockopt: no route to host
<probe> 2016/02/25 15:02:22 Error doing publish for 10.17.0.122:4040, backing off 4s: Post http://10.17.0.122:4040/api/report: dial tcp 10.17.0.122:4040: getsockopt: no route to host
<probe> 2016/02/25 15:02:27 Error doing controls for 10.17.0.122:4040, backing off 16s: dial tcp 10.17.0.122:4040: getsockopt: no route to host
<probe> 2016/02/25 15:02:27 Error fetching app details: Get http://10.17.0.122:4040/api: dial tcp 10.17.0.122:4040: getsockopt: no route to host
<probe> 2016/02/25 15:02:27 Error fetching app details: Get http://10.17.0.120:4040/api: dial tcp 10.17.0.120:4040: getsockopt: no route to host
<probe> 2016/02/25 15:02:27 Error fetching app details: Get http://10.17.0.121:4040/api: dial tcp 10.17.0.121:4040: getsockopt: no route to host
<probe> 2016/02/25 15:02:27 Error fetching app details: Get http://10.17.0.124:4040/api: dial tcp 10.17.0.124:4040: getsockopt: no route to host
<probe> 2016/02/25 15:02:30 Error doing publish for 10.17.0.122:4040, backing off 8s: Post http://10.17.0.122:4040/api/report: dial tcp 10.17.0.122:4040: getsockopt: no route to host
<probe> 2016/02/25 15:02:37 Error fetching app details: Get http://10.17.0.122:4040/api: dial tcp 10.17.0.122:4040: getsockopt: no route to host
<probe> 2016/02/25 15:02:37 Error fetching app details: Get http://10.17.0.120:4040/api: dial tcp 10.17.0.120:4040: getsockopt: no route to host
<probe> 2016/02/25 15:02:37 Error fetching app details: Get http://10.17.0.121:4040/api: dial tcp 10.17.0.121:4040: getsockopt: no route to host
<probe> 2016/02/25 15:02:37 Error fetching app details: Get http://10.17.0.124:4040/api: dial tcp 10.17.0.124:4040: getsockopt: no route to host
<probe> 2016/02/25 15:02:40 Error doing publish for 10.17.0.122:4040, backing off 16s: Post http://10.17.0.122:4040/api/report: dial tcp 10.17.0.122:4040: getsockopt: no route to host
<probe> 2016/02/25 15:02:46 Error fetching app details: Get http://10.17.0.122:4040/api: dial tcp 10.17.0.122:4040: getsockopt: no route to host
<probe> 2016/02/25 15:02:46 Error doing controls for 10.17.0.122:4040, backing off 32s: dial tcp 10.17.0.122:4040: getsockopt: no route to host
<probe> 2016/02/25 15:02:47 Error fetching app details: Get http://10.17.0.121:4040/api: dial tcp 10.17.0.121:4040: getsockopt: no route to host
<probe> 2016/02/25 15:02:47 Error fetching app details: Get http://10.17.0.120:4040/api: dial tcp 10.17.0.120:4040: getsockopt: no route to host
<probe> 2016/02/25 15:02:47 Error fetching app details: Get http://10.17.0.124:4040/api: dial tcp 10.17.0.124:4040: getsockopt: no route to host
<probe> 2016/02/25 15:02:57 Error fetching app details: Get http://10.17.0.122:4040/api: dial tcp 10.17.0.122:4040: getsockopt: no route to host
<probe> 2016/02/25 15:02:57 Error fetching app details: Get http://10.17.0.121:4040/api: dial tcp 10.17.0.121:4040: getsockopt: no route to host
<probe> 2016/02/25 15:02:57 Error fetching app details: Get http://10.17.0.120:4040/api: dial tcp 10.17.0.120:4040: getsockopt: no route to host
<probe> 2016/02/25 15:02:57 Error fetching app details: Get http://10.17.0.124:4040/api: dial tcp 10.17.0.124:4040: getsockopt: no route to host
<probe> 2016/02/25 15:03:00 Error doing publish for 10.17.0.122:4040, backing off 32s: Post http://10.17.0.122:4040/api/report: dial tcp 10.17.0.122:4040: getsockopt: no route to host
<probe> 2016/02/25 15:03:07 Error fetching app details: Get http://10.17.0.122:4040/api: dial tcp 10.17.0.122:4040: getsockopt: no route to host
<probe> 2016/02/25 15:03:07 Error fetching app details: Get http://10.17.0.121:4040/api: dial tcp 10.17.0.121:4040: getsockopt: no route to host
<probe> 2016/02/25 15:03:07 Error fetching app details: Get http://10.17.0.120:4040/api: dial tcp 10.17.0.120:4040: getsockopt: no route to host
<probe> 2016/02/25 15:03:07 Error fetching app details: Get http://10.17.0.124:4040/api: dial tcp 10.17.0.124:4040: getsockopt: no route to host
<probe> 2016/02/25 15:03:17 Error fetching app details: Get http://10.17.0.122:4040/api: dial tcp 10.17.0.122:4040: getsockopt: no route to host
<probe> 2016/02/25 15:03:17 Error fetching app details: Get http://10.17.0.120:4040/api: dial tcp 10.17.0.120:4040: getsockopt: no route to host
<probe> 2016/02/25 15:03:17 Error fetching app details: Get http://10.17.0.121:4040/api: dial tcp 10.17.0.121:4040: getsockopt: no route to host
<probe> 2016/02/25 15:03:17 Error fetching app details: Get http://10.17.0.124:4040/api: dial tcp 10.17.0.124:4040: getsockopt: no route to host
<probe> 2016/02/25 15:03:20 Error doing controls for 10.17.0.122:4040, backing off 1m0s: dial tcp 10.17.0.122:4040: getsockopt: no route to host
<probe> 2016/02/25 15:03:27 Error fetching app details: Get http://10.17.0.122:4040/api: dial tcp 10.17.0.122:4040: getsockopt: no route to host
<probe> 2016/02/25 15:03:27 Error fetching app details: Get http://10.17.0.120:4040/api: dial tcp 10.17.0.120:4040: getsockopt: no route to host
<probe> 2016/02/25 15:03:27 Error fetching app details: Get http://10.17.0.121:4040/api: dial tcp 10.17.0.121:4040: getsockopt: no route to host
<probe> 2016/02/25 15:03:27 Error fetching app details: Get http://10.17.0.124:4040/api: dial tcp 10.17.0.124:4040: getsockopt: no route to host
<probe> 2016/02/25 15:03:37 Error fetching app details: Get http://10.17.0.122:4040/api: dial tcp 10.17.0.122:4040: getsockopt: no route to host
<probe> 2016/02/25 15:03:37 Error doing publish for 10.17.0.122:4040, backing off 1m0s: Post http://10.17.0.122:4040/api/report: dial tcp 10.17.0.122:4040: getsockopt: no route to host
<probe> 2016/02/25 15:03:37 Error fetching app details: Get http://10.17.0.121:4040/api: dial tcp 10.17.0.121:4040: getsockopt: no route to host
<probe> 2016/02/25 15:03:37 Error fetching app details: Get http://10.17.0.120:4040/api: dial tcp 10.17.0.120:4040: getsockopt: no route to host
<probe> 2016/02/25 15:03:37 Error fetching app details: Get http://10.17.0.124:4040/api: dial tcp 10.17.0.124:4040: getsockopt: no route to host
<probe> 2016/02/25 15:03:47 Error fetching app details: Get http://10.17.0.122:4040/api: dial tcp 10.17.0.122:4040: getsockopt: no route to host
<probe> 2016/02/25 15:03:47 Error fetching app details: Get http://10.17.0.121:4040/api: dial tcp 10.17.0.121:4040: getsockopt: no route to host
<probe> 2016/02/25 15:03:47 Error fetching app details: Get http://10.17.0.120:4040/api: dial tcp 10.17.0.120:4040: getsockopt: no route to host
<probe> 2016/02/25 15:03:47 Error fetching app details: Get http://10.17.0.124:4040/api: dial tcp 10.17.0.124:4040: getsockopt: no route to host
<probe> 2016/02/25 15:03:57 Error fetching app details: Get http://10.17.0.122:4040/api: dial tcp 10.17.0.122:4040: getsockopt: no route to host
<probe> 2016/02/25 15:03:57 Error fetching app details: Get http://10.17.0.120:4040/api: dial tcp 10.17.0.120:4040: getsockopt: no route to host
<probe> 2016/02/25 15:03:57 Error fetching app details: Get http://10.17.0.121:4040/api: dial tcp 10.17.0.121:4040: getsockopt: no route to host
<probe> 2016/02/25 15:03:57 Error fetching app details: Get http://10.17.0.124:4040/api: dial tcp 10.17.0.124:4040: getsockopt: no route to host
<probe> 2016/02/25 15:04:07 Error fetching app details: Get http://10.17.0.122:4040/api: dial tcp 10.17.0.122:4040: getsockopt: no route to host
<probe> 2016/02/25 15:04:07 Error fetching app details: Get http://10.17.0.121:4040/api: dial tcp 10.17.0.121:4040: getsockopt: no route to host
<probe> 2016/02/25 15:04:07 Error fetching app details: Get http://10.17.0.120:4040/api: dial tcp 10.17.0.120:4040: getsockopt: no route to host
<probe> 2016/02/25 15:04:07 Error fetching app details: Get http://10.17.0.124:4040/api: dial tcp 10.17.0.124:4040: getsockopt: no route to host
<probe> 2016/02/25 15:04:17 Error fetching app details: Get http://10.17.0.122:4040/api: dial tcp 10.17.0.122:4040: getsockopt: no route to host
<probe> 2016/02/25 15:04:17 Error fetching app details: Get http://10.17.0.120:4040/api: dial tcp 10.17.0.120:4040: getsockopt: no route to host
<probe> 2016/02/25 15:04:17 Error fetching app details: Get http://10.17.0.121:4040/api: dial tcp 10.17.0.121:4040: getsockopt: no route to host
<probe> 2016/02/25 15:04:17 Error fetching app details: Get http://10.17.0.124:4040/api: dial tcp 10.17.0.124:4040: getsockopt: no route to host
<probe> 2016/02/25 15:04:20 Error doing controls for 10.17.0.122:4040, backing off 1m0s: dial tcp 10.17.0.122:4040: getsockopt: no route to host
<probe> 2016/02/25 15:04:27 Error fetching app details: Get http://10.17.0.122:4040/api: dial tcp 10.17.0.122:4040: getsockopt: no route to host
<probe> 2016/02/25 15:04:27 Error fetching app details: Get http://10.17.0.120:4040/api: dial tcp 10.17.0.120:4040: getsockopt: no route to host
<probe> 2016/02/25 15:04:27 Error fetching app details: Get http://10.17.0.121:4040/api: dial tcp 10.17.0.121:4040: getsockopt: no route to host
<probe> 2016/02/25 15:04:27 Error fetching app details: Get http://10.17.0.124:4040/api: dial tcp 10.17.0.124:4040: getsockopt: no route to host
<probe> 2016/02/25 15:04:37 Error fetching app details: Get http://10.17.0.122:4040/api: dial tcp 10.17.0.122:4040: getsockopt: no route to host
<probe> 2016/02/25 15:04:37 Error fetching app details: Get http://10.17.0.120:4040/api: dial tcp 10.17.0.120:4040: getsockopt: no route to host
<probe> 2016/02/25 15:04:37 Error fetching app details: Get http://10.17.0.121:4040/api: dial tcp 10.17.0.121:4040: getsockopt: no route to host
<probe> 2016/02/25 15:04:37 Error fetching app details: Get http://10.17.0.124:4040/api: dial tcp 10.17.0.124:4040: getsockopt: no route to host
<probe> 2016/02/25 15:04:40 Error doing publish for 10.17.0.122:4040, backing off 1m0s: Post http://10.17.0.122:4040/api/report: dial tcp 10.17.0.122:4040: getsockopt: no route to host
<probe> 2016/02/25 15:04:47 Error fetching app details: Get http://10.17.0.122:4040/api: dial tcp 10.17.0.122:4040: getsockopt: no route to host
<probe> 2016/02/25 15:04:47 Error fetching app details: Get http://10.17.0.121:4040/api: dial tcp 10.17.0.121:4040: getsockopt: no route to host
<probe> 2016/02/25 15:04:47 Error fetching app details: Get http://10.17.0.120:4040/api: dial tcp 10.17.0.120:4040: getsockopt: no route to host
<probe> 2016/02/25 15:04:47 Error fetching app details: Get http://10.17.0.124:4040/api: dial tcp 10.17.0.124:4040: getsockopt: no route to host
<probe> 2016/02/25 15:04:57 Error fetching app details: Get http://10.17.0.122:4040/api: dial tcp 10.17.0.122:4040: getsockopt: no route to host
<probe> 2016/02/25 15:04:57 Error fetching app details: Get http://10.17.0.120:4040/api: dial tcp 10.17.0.120:4040: getsockopt: no route to host
<probe> 2016/02/25 15:04:57 Error fetching app details: Get http://10.17.0.121:4040/api: dial tcp 10.17.0.121:4040: getsockopt: no route to host
<probe> 2016/02/25 15:04:57 Error fetching app details: Get http://10.17.0.124:4040/api: dial tcp 10.17.0.124:4040: getsockopt: no route to host
<probe> 2016/02/25 15:05:07 Error fetching app details: Get http://10.17.0.122:4040/api: dial tcp 10.17.0.122:4040: getsockopt: no route to host
<probe> 2016/02/25 15:05:07 Error fetching app details: Get http://10.17.0.121:4040/api: dial tcp 10.17.0.121:4040: getsockopt: no route to host
<probe> 2016/02/25 15:05:07 Error fetching app details: Get http://10.17.0.120:4040/api: dial tcp 10.17.0.120:4040: getsockopt: no route to host
<probe> 2016/02/25 15:05:07 Error fetching app details: Get http://10.17.0.124:4040/api: dial tcp 10.17.0.124:4040: getsockopt: no route to host
<probe> 2016/02/25 15:05:17 Error fetching app details: Get http://10.17.0.122:4040/api: dial tcp 10.17.0.122:4040: getsockopt: no route to host
<probe> 2016/02/25 15:05:17 Error fetching app details: Get http://10.17.0.120:4040/api: dial tcp 10.17.0.120:4040: getsockopt: no route to host
<probe> 2016/02/25 15:05:17 Error fetching app details: Get http://10.17.0.121:4040/api: dial tcp 10.17.0.121:4040: getsockopt: no route to host
<probe> 2016/02/25 15:05:17 Error fetching app details: Get http://10.17.0.124:4040/api: dial tcp 10.17.0.124:4040: getsockopt: no route to host
<probe> 2016/02/25 15:05:23 Error doing controls for 10.17.0.122:4040, backing off 1m0s: dial tcp 10.17.0.122:4040: getsockopt: no route to host
<probe> 2016/02/25 15:05:26 Error fetching app details: Get http://10.17.0.122:4040/api: dial tcp 10.17.0.122:4040: getsockopt: no route to host
<probe> 2016/02/25 15:05:27 Error fetching app details: Get http://10.17.0.120:4040/api: dial tcp 10.17.0.120:4040: getsockopt: no route to host
<probe> 2016/02/25 15:05:27 Error fetching app details: Get http://10.17.0.121:4040/api: dial tcp 10.17.0.121:4040: getsockopt: no route to host
<probe> 2016/02/25 15:05:27 Error fetching app details: Get http://10.17.0.124:4040/api: dial tcp 10.17.0.124:4040: getsockopt: no route to host
<probe> 2016/02/25 15:05:37 Error fetching app details: Get http://10.17.0.122:4040/api: dial tcp 10.17.0.122:4040: getsockopt: no route to host
<probe> 2016/02/25 15:05:37 Error fetching app details: Get http://10.17.0.120:4040/api: dial tcp 10.17.0.120:4040: getsockopt: no route to host
<probe> 2016/02/25 15:05:37 Error fetching app details: Get http://10.17.0.121:4040/api: dial tcp 10.17.0.121:4040: getsockopt: no route to host
<probe> 2016/02/25 15:05:37 Error fetching app details: Get http://10.17.0.124:4040/api: dial tcp 10.17.0.124:4040: getsockopt: no route to host
<probe> 2016/02/25 15:05:46 Error fetching app details: Get http://10.17.0.122:4040/api: dial tcp 10.17.0.122:4040: getsockopt: no route to host
<probe> 2016/02/25 15:05:46 Error doing publish for 10.17.0.122:4040, backing off 1m0s: Post http://10.17.0.122:4040/api/report: dial tcp 10.17.0.122:4040: getsockopt: no route to host
<probe> 2016/02/25 15:05:47 Error fetching app details: Get http://10.17.0.120:4040/api: dial tcp 10.17.0.120:4040: getsockopt: no route to host
<probe> 2016/02/25 15:05:47 Error fetching app details: Get http://10.17.0.121:4040/api: dial tcp 10.17.0.121:4040: getsockopt: no route to host
<probe> 2016/02/25 15:05:47 Error fetching app details: Get http://10.17.0.124:4040/api: dial tcp 10.17.0.124:4040: getsockopt: no route to host
<probe> 2016/02/25 15:05:57 Error fetching app details: Get http://10.17.0.122:4040/api: dial tcp 10.17.0.122:4040: getsockopt: no route to host
<probe> 2016/02/25 15:05:57 Error fetching app details: Get http://10.17.0.121:4040/api: dial tcp 10.17.0.121:4040: getsockopt: no route to host
<probe> 2016/02/25 15:05:57 Error fetching app details: Get http://10.17.0.120:4040/api: dial tcp 10.17.0.120:4040: getsockopt: no route to host
<probe> 2016/02/25 15:05:57 Error fetching app details: Get http://10.17.0.124:4040/api: dial tcp 10.17.0.124:4040: getsockopt: no route to host
<probe> 2016/02/25 15:06:07 Error fetching app details: Get http://10.17.0.122:4040/api: dial tcp 10.17.0.122:4040: getsockopt: no route to host
<probe> 2016/02/25 15:06:07 Error fetching app details: Get http://10.17.0.121:4040/api: dial tcp 10.17.0.121:4040: getsockopt: no route to host
<probe> 2016/02/25 15:06:07 Error fetching app details: Get http://10.17.0.120:4040/api: dial tcp 10.17.0.120:4040: getsockopt: no route to host
<probe> 2016/02/25 15:06:07 Error fetching app details: Get http://10.17.0.124:4040/api: dial tcp 10.17.0.124:4040: getsockopt: no route to host
<probe> 2016/02/25 15:06:17 Error fetching app details: Get http://10.17.0.122:4040/api: dial tcp 10.17.0.122:4040: getsockopt: no route to host
<probe> 2016/02/25 15:06:17 Error fetching app details: Get http://10.17.0.121:4040/api: dial tcp 10.17.0.121:4040: getsockopt: no route to host
<probe> 2016/02/25 15:06:17 Error fetching app details: Get http://10.17.0.120:4040/api: dial tcp 10.17.0.120:4040: getsockopt: no route to host
<probe> 2016/02/25 15:06:17 Error fetching app details: Get http://10.17.0.124:4040/api: dial tcp 10.17.0.124:4040: getsockopt: no route to host
<probe> 2016/02/25 15:06:26 Error fetching app details: Get http://10.17.0.122:4040/api: dial tcp 10.17.0.122:4040: getsockopt: no route to host
<probe> 2016/02/25 15:06:26 Error doing controls for 10.17.0.122:4040, backing off 1m0s: dial tcp 10.17.0.122:4040: getsockopt: no route to host
<probe> 2016/02/25 15:06:27 Error fetching app details: Get http://10.17.0.121:4040/api: dial tcp 10.17.0.121:4040: getsockopt: no route to host
<probe> 2016/02/25 15:06:27 Error fetching app details: Get http://10.17.0.120:4040/api: dial tcp 10.17.0.120:4040: getsockopt: no route to host
<probe> 2016/02/25 15:06:27 Error fetching app details: Get http://10.17.0.124:4040/api: dial tcp 10.17.0.124:4040: getsockopt: no route to host
<probe> 2016/02/25 15:06:37 Error fetching app details: Get http://10.17.0.122:4040/api: dial tcp 10.17.0.122:4040: getsockopt: no route to host
<probe> 2016/02/25 15:06:37 Error fetching app details: Get http://10.17.0.120:4040/api: dial tcp 10.17.0.120:4040: getsockopt: no route to host
<probe> 2016/02/25 15:06:37 Error fetching app details: Get http://10.17.0.121:4040/api: dial tcp 10.17.0.121:4040: getsockopt: no route to host
<probe> 2016/02/25 15:06:37 Error fetching app details: Get http://10.17.0.124:4040/api: dial tcp 10.17.0.124:4040: getsockopt: no route to host
<probe> 2016/02/25 15:06:47 Error fetching app details: Get http://10.17.0.122:4040/api: dial tcp 10.17.0.122:4040: getsockopt: no route to host
<probe> 2016/02/25 15:06:47 Error fetching app details: Get http://10.17.0.121:4040/api: dial tcp 10.17.0.121:4040: getsockopt: no route to host
<probe> 2016/02/25 15:06:47 Error fetching app details: Get http://10.17.0.120:4040/api: dial tcp 10.17.0.120:4040: getsockopt: no route to host
<probe> 2016/02/25 15:06:47 Error fetching app details: Get http://10.17.0.124:4040/api: dial tcp 10.17.0.124:4040: getsockopt: no route to host
<probe> 2016/02/25 15:06:50 Error doing publish for 10.17.0.122:4040, backing off 1m0s: Post http://10.17.0.122:4040/api/report: dial tcp 10.17.0.122:4040: getsockopt: no route to host
<probe> 2016/02/25 15:06:57 Error fetching app details: Get http://10.17.0.122:4040/api: dial tcp 10.17.0.122:4040: getsockopt: no route to host
<probe> 2016/02/25 15:06:57 Error fetching app details: Get http://10.17.0.120:4040/api: dial tcp 10.17.0.120:4040: getsockopt: no route to host
<probe> 2016/02/25 15:06:57 Error fetching app details: Get http://10.17.0.121:4040/api: dial tcp 10.17.0.121:4040: getsockopt: no route to host
<probe> 2016/02/25 15:06:57 Error fetching app details: Get http://10.17.0.124:4040/api: dial tcp 10.17.0.124:4040: getsockopt: no route to host
<probe> 2016/02/25 15:07:07 Error fetching app details: Get http://10.17.0.122:4040/api: dial tcp 10.17.0.122:4040: getsockopt: no route to host
<probe> 2016/02/25 15:07:07 Error fetching app details: Get http://10.17.0.121:4040/api: dial tcp 10.17.0.121:4040: getsockopt: no route to host
<probe> 2016/02/25 15:07:07 Error fetching app details: Get http://10.17.0.120:4040/api: dial tcp 10.17.0.120:4040: getsockopt: no route to host
<probe> 2016/02/25 15:07:07 Error fetching app details: Get http://10.17.0.124:4040/api: dial tcp 10.17.0.124:4040: getsockopt: no route to host
<probe> 2016/02/25 15:07:17 Error fetching app details: Get http://10.17.0.122:4040/api: dial tcp 10.17.0.122:4040: getsockopt: no route to host
<probe> 2016/02/25 15:07:17 Error fetching app details: Get http://10.17.0.120:4040/api: dial tcp 10.17.0.120:4040: getsockopt: no route to host
<probe> 2016/02/25 15:07:17 Error fetching app details: Get http://10.17.0.121:4040/api: dial tcp 10.17.0.121:4040: getsockopt: no route to host
<probe> 2016/02/25 15:07:17 Error fetching app details: Get http://10.17.0.124:4040/api: dial tcp 10.17.0.124:4040: getsockopt: no route to host
<probe> 2016/02/25 15:07:27 Error doing controls for 10.17.0.122:4040, backing off 1m0s: dial tcp 10.17.0.122:4040: getsockopt: no route to host
<probe> 2016/02/25 15:07:27 Error fetching app details: Get http://10.17.0.122:4040/api: dial tcp 10.17.0.122:4040: getsockopt: no route to host
<probe> 2016/02/25 15:07:27 Error fetching app details: Get http://10.17.0.120:4040/api: dial tcp 10.17.0.120:4040: getsockopt: no route to host
<probe> 2016/02/25 15:07:27 Error fetching app details: Get http://10.17.0.121:4040/api: dial tcp 10.17.0.121:4040: getsockopt: no route to host
<probe> 2016/02/25 15:07:27 Error fetching app details: Get http://10.17.0.124:4040/api: dial tcp 10.17.0.124:4040: getsockopt: no route to host
<probe> 2016/02/25 15:07:37 Error fetching app details: Get http://10.17.0.122:4040/api: dial tcp 10.17.0.122:4040: getsockopt: no route to host
<probe> 2016/02/25 15:07:37 Error fetching app details: Get http://10.17.0.121:4040/api: dial tcp 10.17.0.121:4040: getsockopt: no route to host
<probe> 2016/02/25 15:07:37 Error fetching app details: Get http://10.17.0.120:4040/api: dial tcp 10.17.0.120:4040: getsockopt: no route to host
<probe> 2016/02/25 15:07:37 Error fetching app details: Get http://10.17.0.124:4040/api: dial tcp 10.17.0.124:4040: getsockopt: no route to host
<probe> 2016/02/25 15:07:47 Error fetching app details: Get http://10.17.0.122:4040/api: dial tcp 10.17.0.122:4040: getsockopt: no route to host
<probe> 2016/02/25 15:07:47 Error fetching app details: Get http://10.17.0.120:4040/api: dial tcp 10.17.0.120:4040: getsockopt: no route to host
<probe> 2016/02/25 15:07:47 Error fetching app details: Get http://10.17.0.121:4040/api: dial tcp 10.17.0.121:4040: getsockopt: no route to host
<probe> 2016/02/25 15:07:47 Error fetching app details: Get http://10.17.0.124:4040/api: dial tcp 10.17.0.124:4040: getsockopt: no route to host

After upgrade to latest build (1bfbf67), the logs show probes existing the publish loop:

<probe> INFO: 2016/02/25 15:19:16.745002 probe starting, version 1bfbf67, ID 1f2751e094d74910
<probe> INFO: 2016/02/25 15:19:16.745189 publishing to: localhost:4040
<app> INFO: 2016/02/25 15:19:16.738805 app starting, version 1bfbf67, ID 64fd378cc9782f19
<app> INFO: 2016/02/25 15:19:16.788179 listening on :4040
<probe> ERRO: 2016/02/25 15:19:16.852362 Error fetching app details: Get http://127.0.0.1:4040/api: dial tcp 127.0.0.1:4040: getsockopt: connection refused
<app> INFO: 2016/02/25 15:19:16.916548 Success updating weaveDNS
<probe> INFO: 2016/02/25 15:19:16.918181 docker container: collecting stats for b2fefb55d691d02cd70d52e9f379477f657a51d8a82201252b35971fff6e7a5d
<probe> INFO: 2016/02/25 15:19:16.946320 docker container: collecting stats for ea1a502b7566710abdfcaf988a850d68274b85557dc8934cf5f12a03939bcb77
<probe> INFO: 2016/02/25 15:19:17.013429 docker container: collecting stats for a04912ef9726a656b764bb3d6831f827a7773cb36e4a9c3481fef09c5bada20c
<probe> INFO: 2016/02/25 15:19:17.060018 docker container: collecting stats for c90bb027f76b34125fb3225ac5650356dfaf34c8b667b265f530569da8ae4b7a
<probe> INFO: 2016/02/25 15:19:17.139770 Success collecting weave info
<probe> ERRO: 2016/02/25 15:19:19.888493 Error fetching app details: Get http://10.17.0.124:4040/api: dial tcp 10.17.0.124:4040: getsockopt: no route to host
<probe> ERRO: 2016/02/25 15:19:19.900447 Error fetching app details: Get http://10.17.0.120:4040/api: dial tcp 10.17.0.120:4040: getsockopt: no route to host
<probe> ERRO: 2016/02/25 15:19:19.900548 Error fetching app details: Get http://10.17.0.121:4040/api: dial tcp 10.17.0.121:4040: getsockopt: no route to host
<probe> ERRO: 2016/02/25 15:19:19.900630 Error fetching app details: Get http://10.17.0.122:4040/api: dial tcp 10.17.0.122:4040: getsockopt: no route to host
<probe> INFO: 2016/02/25 15:19:19.900721 Control connection to 45.55.139.9:4040 starting
<probe> INFO: 2016/02/25 15:19:19.901322 Control connection to 10.32.0.1:4040 starting
<probe> INFO: 2016/02/25 15:19:19.901861 Control connection to 10.37.85.86:4040 starting
<probe> INFO: 2016/02/25 15:19:19.902199 Control connection to 104.236.54.23:4040 starting
<probe> ERRO: 2016/02/25 15:19:19.916526 Error doing controls for 45.55.139.9:4040, backing off 1s: dial tcp 45.55.139.9:4040: getsockopt: connection refused
<probe> ERRO: 2016/02/25 15:19:20.918037 Error doing controls for 45.55.139.9:4040, backing off 2s: dial tcp 45.55.139.9:4040: getsockopt: connection refused
<probe> INFO: 2016/02/25 15:19:22.870725 Publish loop for 45.55.139.9:4040 starting
<probe> INFO: 2016/02/25 15:19:22.870870 Publish loop for 10.32.0.1:4040 starting
<probe> INFO: 2016/02/25 15:19:22.870903 Publish loop for 10.37.85.86:4040 starting
<probe> INFO: 2016/02/25 15:19:22.870948 Publish loop for 104.236.54.23:4040 starting
<probe> ERRO: 2016/02/25 15:19:22.920287 Error doing controls for 45.55.139.9:4040, backing off 4s: dial tcp 45.55.139.9:4040: getsockopt: connection refused
<probe> ERRO: 2016/02/25 15:19:25.892155 Error doing publish for 45.55.139.9:4040, backing off 1s: Post http://45.55.139.9:4040/api/report: dial tcp 45.55.139.9:4040: getsockopt: connection refused
<probe> INFO: 2016/02/25 15:19:26.889445 Control connection to 127.0.0.1:4040 starting
<probe> ERRO: 2016/02/25 15:19:26.921286 Error doing controls for 45.55.139.9:4040, backing off 8s: dial tcp 45.55.139.9:4040: getsockopt: connection refused
<probe> INFO: 2016/02/25 15:19:28.898284 Publish loop for 127.0.0.1:4040 starting
<probe> INFO: 2016/02/25 15:19:29.902807 Publish loop for 10.32.0.1:4040 exiting
<probe> INFO: 2016/02/25 15:19:29.904353 Control connection to 10.32.0.1:4040 exiting
<probe> INFO: 2016/02/25 15:19:29.904745 Publish loop for 10.37.85.86:4040 exiting
<probe> INFO: 2016/02/25 15:19:29.907913 Control connection to 10.37.85.86:4040 exiting
<probe> INFO: 2016/02/25 15:19:29.908293 Publish loop for 104.236.54.23:4040 exiting
<probe> INFO: 2016/02/25 15:19:29.912126 Control connection to 104.236.54.23:4040 exiting
<probe> INFO: 2016/02/25 15:19:29.912445 Control connection to 45.55.139.9:4040 exiting
<probe> INFO: 2016/02/25 15:19:29.912736 Publish loop for 45.55.139.9:4040 exiting
<app> INFO: 2016/02/25 15:19:31.502230 Error reading from probe 2b8296e73a5040ef control websocket: websocket: close 1006 (abnormal closure): unexpected EOF
<app> INFO: 2016/02/25 15:19:31.503839 Error reading from probe 2b8296e73a5040ef control websocket: websocket: close 1006 (abnormal closure): unexpected EOF
<app> INFO: 2016/02/25 15:19:32.146551 Error reading from probe 2a95db3f43e75221 control websocket: websocket: close 1006 (abnormal closure): unexpected EOF
<app> INFO: 2016/02/25 15:19:32.149006 Error reading from probe 2a95db3f43e75221 control websocket: websocket: close 1006 (abnormal closure): unexpected EOF
<app> INFO: 2016/02/25 15:19:32.435581 Error reading from probe 8d0a67384484b8f control websocket: websocket: close 1006 (abnormal closure): unexpected EOF
<app> INFO: 2016/02/25 15:19:32.442078 Error reading from probe 8d0a67384484b8f control websocket: websocket: close 1006 (abnormal closure): unexpected EOF
<probe> INFO: 2016/02/25 15:19:35.941051 docker container: collecting stats for 3b9bc3d60d1149525742ab7313ef51072298eba07f97271109933f701e8ee72e
<probe> ERRO: 2016/02/25 15:19:35.947853 Error gather stats for container: 3b9bc3d60d1149525742ab7313ef51072298eba07f97271109933f701e8ee72e
<probe> ERRO: 2016/02/25 15:19:36.068103 docker container: error reading event, did container stop? read unix @->/var/run/docker.sock: use of closed network connection
<probe> INFO: 2016/02/25 15:19:36.068177 docker container: stopped collecting stats for 3b9bc3d60d1149525742ab7313ef51072298eba07f97271109933f701e8ee72e
<app> INFO: 2016/02/25 15:19:40.317272 Error reading from probe 5ec4a05f2622402 control websocket: websocket: close 1006 (abnormal closure): unexpected EOF
<app> INFO: 2016/02/25 15:19:40.319226 Error reading from probe 5ec4a05f2622402 control websocket: websocket: close 1006 (abnormal closure): unexpected EOF
<probe> INFO: 2016/02/25 15:19:55.309094 docker container: collecting stats for 46ca5ad558e0bf5ad5982f10e750b72e7f5f154da2ab666b492c66aec69d862e
<probe> INFO: 2016/02/25 15:19:55.314646 docker container: collecting stats for 46ca5ad558e0bf5ad5982f10e750b72e7f5f154da2ab666b492c66aec69d862e
<probe> INFO: 2016/02/25 15:19:55.396383 docker container: stopped collecting stats for 46ca5ad558e0bf5ad5982f10e750b72e7f5f154da2ab666b492c66aec69d862e
<probe> INFO: 2016/02/25 15:19:55.396429 docker container: stopped collecting stats for 46ca5ad558e0bf5ad5982f10e750b72e7f5f154da2ab666b492c66aec69d862e
<probe> INFO: 2016/02/25 15:20:00.611373 docker container: collecting stats for 23abaef0af325e623dc3a9a4b5ce30533468852a58f8ac17cbd7a9a13124685a
<probe> ERRO: 2016/02/25 15:20:00.684120 docker container: error reading event, did container stop? read unix @->/var/run/docker.sock: use of closed network connection
<probe> INFO: 2016/02/25 15:20:00.684154 docker container: stopped collecting stats for 23abaef0af325e623dc3a9a4b5ce30533468852a58f8ac17cbd7a9a13124685a
<probe> INFO: 2016/02/25 15:20:02.786925 docker container: collecting stats for 57e621f8bca1436515e9015f98475466bb78ec5295e9342b1f4205d81a16546a
<probe> INFO: 2016/02/25 15:20:02.876550 docker container: stopped collecting stats for 57e621f8bca1436515e9015f98475466bb78ec5295e9342b1f4205d81a16546a
<probe> INFO: 2016/02/25 15:20:38.722507 docker container: collecting stats for 7c633cf5ea9fb673d133caa42e4fd91b88e1eaff44d25c38e2d6598243906e24
<probe> INFO: 2016/02/25 15:20:38.827788 docker container: stopped collecting stats for 7c633cf5ea9fb673d133caa42e4fd91b88e1eaff44d25c38e2d6598243906e24
<probe> INFO: 2016/02/25 15:20:46.138782 docker container: collecting stats for 5778b193bee97a63774b32d2c82e0b6959d275c15660423304db27de774e8123
<probe> INFO: 2016/02/25 15:20:46.233608 docker container: stopped collecting stats for 5778b193bee97a63774b32d2c82e0b6959d275c15660423304db27de774e8123

@2opremio
Copy link
Contributor

So, there are two problems:

  1. Scope uses mixed DNS records from weave, eth0, and containers (which was happening with 0.12 already)
  2. Scope fails the publishing loop (with master, maybe it fails and it's legit due to the installation order) but after the publish loop exits it doesn't try again.

@errordeveloper
Copy link
Contributor Author

Also, doing scope stop ; scope launch ; on each node resolves the issue...

@errordeveloper
Copy link
Contributor Author

I've just tried stoping all, downgrading and upgrading again and have in fact noticed that for a brief period of time after upgrading the probes are publishing and that's visible in the app.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Broken end user or developer functionality; not working as the developers intended it
Projects
None yet
Development

No branches or pull requests

4 participants