-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Docker swarm overlay networking not working after --force-new-cluster #495
Comments
One thing I've discovered via debugging is that a change introduced in this commit might be responsible moby/libnetwork@5008b0c If line 259 of controller.go is changed to simply be
Then the problem goes away. This is because at some point the agent is stopped and it's never restarted in the case of swarm init with force new cluster. Maybe there could be some other way to prevent this race condition? Maybe checking to see if the agent is really active? I will do more digging and add what I find. |
the libnetwork fix was included in Docker 18.09.4 through docker-archive/engine#169 ; should this one be closed? |
oh, sorry, it was not yet in 18.09; cherry-picking now |
full diff: moby/libnetwork@c902989...872f0a8 - moby/libnetwork#2354 [18.09 backport] Cleanup the cluster provider when the agent is closed - backport of moby/libnetwork#2307 Fix for problem where agent is stopped and does not restart - fixes docker/for-linux#495 Docker swarm overlay networking not working after --force-new-cluster - moby/libnetwork#2369 [18.09 BACKPORT] Pick a random host port if the user does not specify a host port - backport of moby/libnetwork#2368 (windows) Pick a random host port if the user does not specify a host port Signed-off-by: Sebastiaan van Stijn <[email protected]>
full diff: moby/libnetwork@c902989...872f0a8 - moby/libnetwork#2354 [18.09 backport] Cleanup the cluster provider when the agent is closed - backport of moby/libnetwork#2307 Fix for problem where agent is stopped and does not restart - fixes docker/for-linux#495 Docker swarm overlay networking not working after --force-new-cluster - moby/libnetwork#2369 [18.09 BACKPORT] Pick a random host port if the user does not specify a host port - backport of moby/libnetwork#2368 (windows) Pick a random host port if the user does not specify a host port Signed-off-by: Sebastiaan van Stijn <[email protected]> Upstream-commit: 5354408039681020f9ad6afe4bf696fc90f9ce69 Component: engine
Hello everyone, I'm wondering if this issue is really resolved as I seem to be facing the same kind of name resolution problem after issuing a "docker swarm init --force-new-cluster" on an "isolated" manager. One big difference in my scenario is that I'm NOT deploying services thru Swarm, I'm deploying containers thru classic docker-compose and just make use of an overlay network managed by Swarm onto which I'm attaching containers in docker-compose. Things are working fine, containers are able to communicate, but now on the 2 manager nodes, let's say one fails. At this point I have to "docker swarm init --force-new-cluster" on the survivor, but as soon as I issue the command, I can see in containers logs that they become unable to resolve each others names (I get "Name or service not known" errors). Based on my tests, the name resolution only works again when I restart the container I'm trying to resolve on the survived node. It looks like if on startup the container was somehow registering itself again on the "new" swarm overlay network that was recreated when I issued the "force-new-cluster" command. Here's an example of the issue. Just after issuing the "force-new-cluster" command on the survivor, on the survived containers I can't resolve any of the other containers names:
Now if I just restart "another-container":
From the first one name resolution works again:
Any idea if this issue could be related to the fact that I'm just attaching containers to the overlay network using docker-compose and not really managing them thru plain swarm ? Thanks for your time ! |
Expected behavior
After a --force-new-cluster and subsequently adding a new node to the cluster the tasks.servicename should be resolved by internal docker dns and containers on the same overlay network should be able to reach each other.
Actual behavior
On the node on which --force-new-cluster was executed the tasks.servicename endpoint will not resolve. On the added node, the tasks.servicename does resolve but it will only resolve to the container on the one node. Also, the containers on the same overlay network cannot reach each other by their ips.
Steps to reproduce the behavior
docker network create --scope swarm --driver overlay --attachable test
docker service create --network test --mode global --name demo demo
docker service logs demo
docker swarm init --force-new-cluster
on one of the nodesRestarting the docker daemon on the first node does resolve the issue.
Output of
docker version
:Output of
docker info
:Additional environment details (AWS, VirtualBox, physical, etc.)
gcloud instances. Have reproduced the same behavior on 18.09 ce as well.
The text was updated successfully, but these errors were encountered: