Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

network kind is ambiguous #1596

Closed
jdef opened this issue May 15, 2020 · 14 comments Β· Fixed by #1831
Closed

network kind is ambiguous #1596

jdef opened this issue May 15, 2020 · 14 comments Β· Fixed by #1831
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Milestone

Comments

@jdef
Copy link

jdef commented May 15, 2020

What happened:
Kicked off multiple builds in our CI environment, some tests use KIND to spin up clusters. Saw s bunch of failures:

  βœ— Preparing nodes πŸ“¦ 
 ERROR: failed to create cluster: docker run error: command "docker run --hostname ci-a510791-control-plane --name ci-a510791-control-plane --label io.x-k8s.kind.role=control-plane --privileged --security-opt seccomp=unconfined --security-opt apparmor=unconfined --tmpfs /tmp --tmpfs /run --volume /var --volume /lib/modules:/lib/modules:ro --detach --tty --label io.x-k8s.kind.cluster=ci-a510791 --net kind --restart=on-failure:1 --volume=/workspace/pr-113/e2e/etc/rootca1.crt:/usr/local/share/ca-certificates/rootca1.crt:ro --volume=/workspace/pr-113/e2e/etc/rootca2.crt:/usr/local/share/ca-certificates/rootca2.crt:ro --publish=127.0.0.1:40759:6443/TCP kindest/node:v1.17.5" failed with error: exit status 125
 Command Output: 4614c0b36ac6a3e641b0a300d07b6b0bc7317132fab3d494d21a3e4777aa5d5a
 docker: Error response from daemon: network kind is ambiguous (2 matches found on name).

What you expected to happen:
No interference between tests, as we experienced w/ KIND 0.7.x

How to reproduce it (as minimally and precisely as possible):

  • create many KIND clusters at the same time, starting from a clean box (w/o a kind network already created)?

Anything else we need to know?:

# docker network ls
NETWORK ID          NAME                DRIVER              SCOPE
98fdc5e5af26        bridge              bridge              local
6f8de42fadad        host                host                local
de87cb7dd35e        kind                bridge              local
158648a47e91        kind                bridge              local
75afe274f8f8        none                null                local

Environment:

  • kind version: (use kind version): kind v0.8.1 go1.13.9 linux/amd64
  • Kubernetes version: (use kubectl version):
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.5", GitCommit:"e0fccafd69541e3750d460ba0f9743b90336f24f", GitTreeState:"clean", BuildDate:"2020-04-16T11:44:03Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}
  • Docker version: (use docker info):
$ docker info
Client:
 Debug Mode: false

Server:
 Containers: 1
  Running: 1
  Paused: 0
  Stopped: 0
 Images: 122
 Server Version: 19.03.6
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: true
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version:
 runc version:
 init version:
 Security Options:
  apparmor
  seccomp
   Profile: default
 Kernel Version: 5.3.0-46-generic
 Operating System: Ubuntu 19.10
 OSType: linux
 Architecture: x86_64
 CPUs: 104
 Total Memory: 754.6GiB
 Name: 7959d5c46f-m9c7p
 ID: UMPE:ZM2Z:POMD:VQDB:7JAM:7OV5:LNJE:XP5W:EX4Z:CA5N:GO35:IBFY
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false
  • OS (e.g. from /etc/os-release):
NAME="Ubuntu"
VERSION="19.10 (Eoan Ermine)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 19.10"
VERSION_ID="19.10"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=eoan
UBUNTU_CODENAME=eoan
@jdef jdef added the kind/bug Categorizes issue or PR as related to a bug. label May 15, 2020
@BenTheElder
Copy link
Member

For now you're going to need to serialize creating the first cluster. I'm not sure if there's a non-racy way to do this in docker.

@BenTheElder
Copy link
Member

xref: moby/moby#20648 docker-compose has this same issue :/

@BenTheElder BenTheElder self-assigned this May 15, 2020
@BenTheElder
Copy link
Member

I don't think docker gives us sufficient tools to avoid a race condition here coordinating a docker network between multiple processes, unless we do our own out-of-band multi-process locking.

... that's not something I'm super excited to add right now and full of potential problems, would it be acceptable instead if we developed sufficient tooling to allow kind to natively create multiple clusters in one command? I've been sketching out a design for that functionality anyhow.

@jdef
Copy link
Author

jdef commented May 20, 2020 via email

@BenTheElder
Copy link
Member

BenTheElder commented May 20, 2020 via email

@BenTheElder
Copy link
Member

Currently workarounds are:

  • Pre-create a kind docker network (and try to get the options right / reasonable)
  • Serialize creating one cluster yourself first (and solve the ordering issue yourself)
  • Retry cluster creation once (and accept that we may flake very early on due to another concurrent attempt causing this race)
  • Use the experimental KIND_EXPERIMENTAL_DOCKER_NETWORK env to set the network to be unique per cluster, knowing that you'll have to deal with cleanup or have potentially infinite networks, and that we may not choose to support this long term.

@BenTheElder
Copy link
Member

I built a test bed with https://godoc.org/github.com/docker/docker/api/types#NetworkCreate CheckDuplicate and it is reliably insufficient.

package main

import (
	"context"
	"fmt"
	"sync"

	"github.com/docker/docker/api/types"
	"github.com/docker/docker/client"
)

func main() {
	cli, err := client.NewClientWithOpts(client.FromEnv)
	if err != nil {
		panic(err)
	}

	networkName := "test"

	createNetwork := func() {
		r, e := cli.NetworkCreate(context.Background(), networkName, types.NetworkCreate{
			CheckDuplicate: true,
			Driver:         "bridge",
		})
		fmt.Println(r, e)
	}
	deleteNetwork := func() {
		fmt.Println(cli.NetworkRemove(context.Background(), networkName))
	}

	var wg sync.WaitGroup
	wg.Add(2)
	go func() {
		createNetwork()
		wg.Done()
	}()
	go func() {
		createNetwork()
		wg.Done()
	}()
	wg.Wait()
	deleteNetwork()
}

results (always the same, except the random IDs):

$ go run .
{8d6b80658e72d596f19c35bd90226171056dc9f93610aec3c2b55b20ad55ff4e } <nil>
{ad09baf925e2a213132c1b9072ec54bc70aaaa0e558a771cc3de2b509d72e948 } <nil>
Error response from daemon: network test is ambiguous (2 matches found based on name)

@BenTheElder
Copy link
Member

I've got a pretty good idea how we can hack a working solution but it's going to be ... a hack.

@BenTheElder
Copy link
Member

BenTheElder commented Jul 29, 2020

Wrote up a detailed outline of the hack I'm considering
https://docs.google.com/document/d/1Q7Njyco2mAz66lS44pVV7ixT22RAkqBrmVMetG1zuT4

(Shared with members of [email protected], our standard SIG Testing group. I can't open documents to the entire internet by automated policy, but I can share with groups. This group is open to join, join it for access, this is common for Kubernetes documents)

@BenTheElder BenTheElder added this to the v0.9.0 milestone Aug 20, 2020
@BenTheElder BenTheElder added the lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. label Sep 12, 2020
@BenTheElder
Copy link
Member

This should be mitigated in 0.9.0 (just released, this was the last blocking issue), please let us know if you still encounter issues.

@BenTheElder
Copy link
Member

FYI @howardjohn @JeremyOT it should be safe to do concurrent multi-cluster bringup in CI in v0.9.0 without any workarounds.

@dgn
Copy link

dgn commented Dec 20, 2022

We're hitting this again in 0.16.0-- have there been changes made that might affect the previous fix?

@BenTheElder
Copy link
Member

Not that I'm aware of.

If you have a reliable reproducer or any other new info, please file a new bug.

We haven't had any reports of this issue in the 2+ years since the fix.

@BenTheElder
Copy link
Member

This will be patched in docker going forward and we shouldn't have any more duplicate networks on future releases moby/moby#20648 (comment)

We'll need to keep the workaround we shipped for now but it should be fixed in docker going forward πŸŽ‰

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants