Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multi-node: Kubernetes cluster does not start after Docker re-assigns node's IP addresses after (Docker) restart #2045

Closed
hadrabap opened this issue Jan 30, 2021 · 34 comments
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete.

Comments

@hadrabap
Copy link

hadrabap commented Jan 30, 2021

Kind Kubernetes cluster does not survive Docker restart. It seems that docker assigns new IPs to containers on each start-up. The KIND nodes however have original IP addresses specified in the generated configuration files causing kubernetes services unable to talk to each other. The most affected ones are scheduler and controller.

What happened:

Kubernetes starts in broken state even though kubectl get pods -A reports otherwise (everything 1/1). The cluster is unable start deployed pods (if deployed before restart) and is unable to deploy anything new due to scheduler is not connected to apiserver.

What you expected to happen:

Kubernetes cluster continues working as expected even after Docker restart.

How to reproduce it (as minimally and precisely as possible):

  1. Install KIND cluster by issuing:
cat <<EOF | kind create cluster --name kind --config=-
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
- role: worker
- role: worker
- role: worker
EOF
  1. Restart Docker
  2. Deploy anything, e.g.: kubectl apply -f https://k8s.io/examples/admin/dns/dnsutils.yaml
  3. Check dnsutils are in Pending state

Anything else we need to know?:

  1. Log files: kind-cluster-logs.tar.gz
  2. I tried to change IP addresses in /kind and /etc/kubernets files but than the services start complaining about certificate not issued for the IP address. Changing IP addresses each time the cluster starts is therefor not a solution.

Environment:

  • kind version: (use kind version):
kind v0.10.0 go1.15.7 darwin/amd64
  • Kubernetes version: (use kubectl version):
Client Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.3", GitCommit:"1e11e4a2108024935ecfcb2912226cedeafd99df", GitTreeState:"clean", BuildDate:"2020-10-14T12:50:19Z", GoVersion:"go1.15.2", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.2", GitCommit:"faecb196815e248d3ecfb03c680a4507229c2a56", GitTreeState:"clean", BuildDate:"2021-01-21T01:11:42Z", GoVersion:"go1.15.5", Compiler:"gc", Platform:"linux/amd64"}
  • Docker version: (use docker info):
Client:
 Version:           20.10.0
 API version:       1.41
 Go version:        go1.15.6
 Git commit:        03fa4b8
 Built:             Sat Dec 12 20:00:39 2020
 OS/Arch:           darwin/amd64
 Context:           default
 Experimental:      true

Server: Docker Engine - Community
 Engine:
  Version:          20.10.2
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.13.15
  Git commit:       8891c58
  Built:            Mon Dec 28 16:15:28 2020
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.4.3
  GitCommit:        269548fa27e0089a8b8278fc4fc781d7f65a939b
 runc:
  Version:          1.0.0-rc92
  GitCommit:        ff819c7e9184c13b7c2607fe6c30ae19403a7aff
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0
  • OS (e.g. from /etc/os-release):
    macOS Catalina 10.15.7 Intel
@hadrabap hadrabap added the kind/bug Categorizes issue or PR as related to a bug. label Jan 30, 2021
@markush81
Copy link

markush81 commented Jan 31, 2021

Just wanted to open the same thing, my information is as follows

Environment:

  • macOS Big Sur 11.1
  • Docker 0.10.2
  • kind v0.10.0 go1.15.7 darwin/amd64

Kind setup

kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
  image: kindest/node:v1.19.1@sha256:98cf5288864662e37115e362b23e4369c8c4a408f99cbc06e58ac30ddc721600
- role: worker
  image: kindest/node:v1.19.1@sha256:98cf5288864662e37115e362b23e4369c8c4a408f99cbc06e58ac30ddc721600
- role: worker
  image: kindest/node:v1.19.1@sha256:98cf5288864662e37115e362b23e4369c8c4a408f99cbc06e58ac30ddc721600

After creating it the inspect of the related docker network shows

...
"Containers": {
            "780b60602e52be16b47f464e861ad065ac9737e9d2f330f18a65dbe242effe91": {
                "Name": "my-k8s-control-plane",
                "EndpointID": "6a14b15c13f74cf39df539c7e85b9e43b0af9d87626fad7f848bf815e40bfb6e",
                "MacAddress": "02:42:ac:12:00:02",
                "IPv4Address": "172.18.0.2/16",
                "IPv6Address": "fc00:f853:ccd:e793::2/64"
            },
            "8c1f003761a65e4dba126bd027440ed24050c8ff9730a37e0ac36b37aace61ba": {
                "Name": "my-k8s-worker2",
                "EndpointID": "93fb4c3abbb2cf3fe8470d8a01b779eae12b488088113ae8b9bd25eea2060daf",
                "MacAddress": "02:42:ac:12:00:04",
                "IPv4Address": "172.18.0.4/16",
                "IPv6Address": "fc00:f853:ccd:e793::4/64"
            },
            "b0623460e508b90eedc47a65e8d10218f70f6cac2d3d4695419dd06496faa9db": {
                "Name": "my-k8s-worker",
                "EndpointID": "1d2a0903a81aa6f8074809c37c9d69ad839c61b0521a5ef345041f4c22a69a50",
                "MacAddress": "02:42:ac:12:00:03",
                "IPv4Address": "172.18.0.3/16",
                "IPv6Address": "fc00:f853:ccd:e793::3/64"
            }
        }
...

After a reboot of the whole machine/ or a docker restart it usually comes to the situation that the IPs have changed

LAST SEEN   TYPE      REASON                    OBJECT                      MESSAGE
14h         Normal    Starting                  node/my-k8s-control-plane   Starting kubelet.
14h         Normal    NodeHasSufficientMemory   node/my-k8s-control-plane   Node my-k8s-control-plane status is now: NodeHasSufficientMemory
14h         Normal    NodeHasNoDiskPressure     node/my-k8s-control-plane   Node my-k8s-control-plane status is now: NodeHasNoDiskPressure
14h         Normal    NodeHasSufficientPID      node/my-k8s-control-plane   Node my-k8s-control-plane status is now: NodeHasSufficientPID
14h         Normal    NodeAllocatableEnforced   node/my-k8s-control-plane   Updated Node Allocatable limit across pods
14h         Normal    Starting                  node/my-k8s-control-plane   Starting kubelet.
14h         Normal    NodeAllocatableEnforced   node/my-k8s-control-plane   Updated Node Allocatable limit across pods
14h         Normal    NodeHasSufficientMemory   node/my-k8s-control-plane   Node my-k8s-control-plane status is now: NodeHasSufficientMemory
14h         Normal    NodeHasNoDiskPressure     node/my-k8s-control-plane   Node my-k8s-control-plane status is now: NodeHasNoDiskPressure
14h         Normal    NodeHasSufficientPID      node/my-k8s-control-plane   Node my-k8s-control-plane status is now: NodeHasSufficientPID
14h         Normal    RegisteredNode            node/my-k8s-control-plane   Node my-k8s-control-plane event: Registered Node my-k8s-control-plane in Controller
14h         Normal    Starting                  node/my-k8s-control-plane   Starting kube-proxy.
14h         Normal    NodeReady                 node/my-k8s-control-plane   Node my-k8s-control-plane status is now: NodeReady
13m         Normal    Starting                  node/my-k8s-control-plane   Starting kubelet.
13m         Normal    NodeHasSufficientMemory   node/my-k8s-control-plane   Node my-k8s-control-plane status is now: NodeHasSufficientMemory
13m         Normal    NodeHasNoDiskPressure     node/my-k8s-control-plane   Node my-k8s-control-plane status is now: NodeHasNoDiskPressure
13m         Normal    NodeHasSufficientPID      node/my-k8s-control-plane   Node my-k8s-control-plane status is now: NodeHasSufficientPID
13m         Normal    NodeAllocatableEnforced   node/my-k8s-control-plane   Updated Node Allocatable limit across pods
13m         Normal    Starting                  node/my-k8s-control-plane   Starting kube-proxy.
14h         Normal    Starting                  node/my-k8s-worker          Starting kubelet.
14h         Normal    NodeHasSufficientMemory   node/my-k8s-worker          Node my-k8s-worker status is now: NodeHasSufficientMemory
14h         Normal    NodeHasNoDiskPressure     node/my-k8s-worker          Node my-k8s-worker status is now: NodeHasNoDiskPressure
14h         Normal    NodeHasSufficientPID      node/my-k8s-worker          Node my-k8s-worker status is now: NodeHasSufficientPID
14h         Normal    NodeAllocatableEnforced   node/my-k8s-worker          Updated Node Allocatable limit across pods
14h         Normal    RegisteredNode            node/my-k8s-worker          Node my-k8s-worker event: Registered Node my-k8s-worker in Controller
14h         Normal    Starting                  node/my-k8s-worker          Starting kube-proxy.
14h         Normal    NodeReady                 node/my-k8s-worker          Node my-k8s-worker status is now: NodeReady
13m         Normal    Starting                  node/my-k8s-worker          Starting kubelet.
13m         Normal    NodeHasSufficientMemory   node/my-k8s-worker          Node my-k8s-worker status is now: NodeHasSufficientMemory
13m         Normal    NodeHasNoDiskPressure     node/my-k8s-worker          Node my-k8s-worker status is now: NodeHasNoDiskPressure
13m         Normal    NodeHasSufficientPID      node/my-k8s-worker          Node my-k8s-worker status is now: NodeHasSufficientPID
13m         Normal    NodeAllocatableEnforced   node/my-k8s-worker          Updated Node Allocatable limit across pods
13m         Warning   Rebooted                  node/my-k8s-worker          Node my-k8s-worker has been rebooted, boot id: de45af38-2b9c-4634-b4aa-5ef1dab149df
13m         Normal    Starting                  node/my-k8s-worker          Starting kube-proxy.
14h         Normal    Starting                  node/my-k8s-worker2         Starting kubelet.
14h         Normal    NodeHasSufficientMemory   node/my-k8s-worker2         Node my-k8s-worker2 status is now: NodeHasSufficientMemory
14h         Normal    NodeHasNoDiskPressure     node/my-k8s-worker2         Node my-k8s-worker2 status is now: NodeHasNoDiskPressure
14h         Normal    NodeHasSufficientPID      node/my-k8s-worker2         Node my-k8s-worker2 status is now: NodeHasSufficientPID
14h         Normal    NodeAllocatableEnforced   node/my-k8s-worker2         Updated Node Allocatable limit across pods
14h         Normal    RegisteredNode            node/my-k8s-worker2         Node my-k8s-worker2 event: Registered Node my-k8s-worker2 in Controller
14h         Normal    Starting                  node/my-k8s-worker2         Starting kube-proxy.
14h         Normal    NodeReady                 node/my-k8s-worker2         Node my-k8s-worker2 status is now: NodeReady
13m         Normal    Starting                  node/my-k8s-worker2         Starting kubelet.
13m         Normal    NodeHasSufficientMemory   node/my-k8s-worker2         Node my-k8s-worker2 status is now: NodeHasSufficientMemory
13m         Normal    NodeHasNoDiskPressure     node/my-k8s-worker2         Node my-k8s-worker2 status is now: NodeHasNoDiskPressure
13m         Normal    NodeHasSufficientPID      node/my-k8s-worker2         Node my-k8s-worker2 status is now: NodeHasSufficientPID
13m         Normal    NodeAllocatableEnforced   node/my-k8s-worker2         Updated Node Allocatable limit across pods
13m         Warning   Rebooted                  node/my-k8s-worker2         Node my-k8s-worker2 has been rebooted, boot id: de45af38-2b9c-4634-b4aa-5ef1dab149df
13m         Normal    Starting                  node/my-k8s-worker2         Starting kube-proxy.
 "Containers": {
            "780b60602e52be16b47f464e861ad065ac9737e9d2f330f18a65dbe242effe91": {
                "Name": "my-k8s-control-plane",
                "EndpointID": "40ac0af6846bce624d6e20f3fe816a699999491ec006908ae3dfdbff3ed34027",
                "MacAddress": "02:42:ac:12:00:03",
                "IPv4Address": "172.18.0.3/16",
                "IPv6Address": "fc00:f853:ccd:e793::3/64"
            },
            "8c1f003761a65e4dba126bd027440ed24050c8ff9730a37e0ac36b37aace61ba": {
                "Name": "my-k8s-worker2",
                "EndpointID": "b36d5e2db63cecdf960519521de6e2bb32402bf2e0db0b07d505344f368a17db",
                "MacAddress": "02:42:ac:12:00:02",
                "IPv4Address": "172.18.0.2/16",
                "IPv6Address": "fc00:f853:ccd:e793::2/64"
            },
            "b0623460e508b90eedc47a65e8d10218f70f6cac2d3d4695419dd06496faa9db": {
                "Name": "my-k8s-worker",
                "EndpointID": "9c38c3f4c91b97cb213d520eda7b1af60fe682584f35c62c0d51773bd24ab282",
                "MacAddress": "02:42:ac:12:00:04",
                "IPv4Address": "172.18.0.4/16",
                "IPv6Address": "fc00:f853:ccd:e793::4/64"
            }
        },

Changed IP adresses


my-k8s-control-plane: 172.18.0.2/16 -> 172.18.0.3/16
my-k8s-worker: 172.18.0.3/16 -> 172.18.0.4/16
my-k8s-worker2: 172.18.0.4/16 -> 172.18.0.2/16

This mixup now confuses the cluster internally, since it doesn't get this updates somehow.

The pods itself all get into running state, but the cluster is in a non-functional state ... e.g. you can't startup a new pod.

kubectl run -i --tty busybox --image=busybox --restart=Never -- sh
error: timed out waiting for the condition

kubectl get pods
NAME      READY   STATUS    RESTARTS   AGE
busybox   0/1     Pending   0          10m

kubectl describe pod busybox --namespace='default'

Name:         busybox
Namespace:    default
Priority:     0
Node:         <none>
Labels:       run=busybox
Annotations:  <none>
Status:       Pending
IP:
IPs:          <none>
Containers:
  busybox:
    Image:      busybox
    Port:       <none>
    Host Port:  <none>
    Args:
      sh
    Environment:  <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-wghwb (ro)
Volumes:
  default-token-wghwb:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-wghwb
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:          <none>

Looking through variuous pods:

kube-controller-manager-my-k8s-control-plane

 E0131 08:54:15.863481       1 leaderelection.go:321] error retrieving resource lock kube-system/kube-controller-manager: Get "https://172.18.0.2:6443/api/v1/namespaces/kube-system/endpoints/kube-controller-manager?timeout=10s": dial tcp 172.18.0.2:6443: connect: connection refused
 E0131 08:54:19.186263       1 leaderelection.go:321] error retrieving resource lock kube-system/kube-controller-manager: Get "https://172.18.0.2:6443/api/v1/namespaces/kube-system/endpoints/kube-controller-manager?timeout=10s": dial tcp 172.18.0.2:6443: connect: connection refused

coredns

 I0131 08:45:57.742283       1 trace.go:116] Trace[939984059]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:125 (started: 2021-01-31 08:45:27.736738195 +0000 UTC m=+0.425505076) (total time: 30.004062522s):
Trace[939984059]: [30.004062522s] [30.004062522s] END
E0131 08:45:57.742331       1 reflector.go:178] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:125: Failed to list *v1.Namespace: Get "https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout

kube-apiserver-my-k8s-control-plane

 Trace[1533160523]: ---"Transformed response object" 14383ms (08:54:00.090)
 Trace[1533160523]: [14.386428279s] [14.386428279s] END
 I0131 08:54:50.261412       1 trace.go:205] Trace[230667356]: "Get" url:/api/v1/namespaces/kube-system/pods/coredns-f9fd979d6-p4tgv/log,user-agent:kubernetic-backend/v0.0.0 (darwin/amd64) kubernetes/$Format,client:172.18.0.1 (31-Jan-2021 08:54:47.025) (total time: 3235ms):
 Trace[230667356]: ---"Transformed response object" 3234ms (08:54:00.261)
 Trace[230667356]: [3.235978319s] [3.235978319s] END
 I0131 08:55:00.223341       1 client.go:360] parsed scheme: "passthrough"
 I0131 08:55:00.223424       1 passthrough.go:48] ccResolverWrapper: sending update to cc: {[{https://127.0.0.1:2379  <nil> 0 <nil>}] <nil> <nil>}
 I0131 08:55:00.223437       1 clientconn.go:948] ClientConn switching balancer to "pick_first"
 I0131 08:55:00.441118       1 trace.go:205] Trace[980747837]: "Get" url:/api/v1/namespaces/kube-system/pods/kindnet-drjnp/log,user-agent:kubernetic-backend/v0.0.0 (darwin/amd64) kubernetes/$Format,client:172.18.0.1 (31-Jan-2021 08:54:53.639) (total time: 6801ms):
 Trace[980747837]: ---"Transformed response object" 6799ms (08:55:00.441)
 Trace[980747837]: [6.801401461s] [6.801401461s] END

kube-scheduler-my-k8s-control-plane

 E0131 08:55:45.455623       1 reflector.go:127] k8s.io/kubernetes/cmd/kube-scheduler/app/server.go:188: Failed to watch *v1.Pod: failed to list *v1.Pod: Get "https://172.18.0.2:6443/api/v1/pods?fieldSelector=status.phase%21%3DFailed%2Cstatus.phase%21%3DSucceeded&limit=500&resourceVersion=0": dial tcp 172.18.0.2:6443: connect: connection refused
 E0131 08:55:45.840158       1 reflector.go:127] k8s.io/client-go/informers/factory.go:134: Failed to watch *v1.StorageClass: failed to list *v1.StorageClass: Get "https://172.18.0.2:6443/apis/storage.k8s.io/v1/storageclasses?limit=500&resourceVersion=0": dial tcp 172.18.0.2:6443: connect: connection refused
 E0131 08:55:58.745228       1 reflector.go:127] k8s.io/client-go/informers/factory.go:134: Failed to watch *v1beta1.PodDisruptionBudget: failed to list *v1beta1.PodDisruptionBudget: Get "https://172.18.0.2:6443/apis/policy/v1beta1/poddisruptionbudgets?limit=500&resourceVersion=0": dial tcp 172.18.0.2:6443: connect: connection refused
 E0131 08:56:01.444706       1 reflector.go:127] k8s.io/apiserver/pkg/server/dynamiccertificates/configmap_cafile_content.go:206: Failed to watch *v1.ConfigMap: failed to list *v1.ConfigMap: Get "https://172.18.0.2:6443/api/v1/namespaces/kube-system/configmaps?fieldSelector=metadata.name%3Dextension-apiserver-authentication&limit=500&resourceVersion=0": dial tcp 172.18.0.2:6443: connect: connection refused

@hadrabap hadrabap reopened this Jan 31, 2021
@hadrabap
Copy link
Author

OKi, I've realized that kind creates its own docker network. Thank you @markush81!

I've patched kind Docker provider so it (currently) generates sequential IP addresses for each node and forces them with docker ... --ip XXX.

This seems to solve the issue. Docker reuses the IPs after restart.

Take a look at my branch.

I'll try to generalize it and make it more flexible, but I might fail as I have no idea about Go. :-/

@aojea
Copy link
Contributor

aojea commented Jan 31, 2021

/assign

custom IP allocation was discussed before and it not easy to implement, check that you can have multiple cluster created at same time ...

It will be interesting to understand what is the root cause, we switched to dns in most parts to allow restarts ... can this be a kubernets limitation or are we missing something?

@hadrabap
Copy link
Author

hadrabap commented Jan 31, 2021

So, I've implemented simple mechanism which:

  1. Gets network address from kind Docker network
  2. Obtains all already assigned IP addresses in that network
  3. Sequentially generates IP addresses for the network and if it is not found in the assigned IP list then it is used.

The mechanism has however one disadvantage: it fills possible gaps. But it works even for subsequent additional cluster creations.

Checkout the code but don't take it seriously, these are my first lines of code in Go in my life.

P.S.: I'll take a look how difficult it could be for me to implement values parameter so one can specify docker IP manually, e.g.:

kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
  docker-ip: 172.18.0.10
- role: worker
  docker-ip: 172.18.0.11
- role: worker
  docker-ip: 172.18.0.12
- role: worker
  docker-ip: 172.18.0.13

P.P.S.: I've uploaded compiled binary for anybody who wants to test it: https://github.com/hadrabap/kind-test-snapshots/blob/main/kind

@hadrabap
Copy link
Author

It will be interesting to understand what is the root cause, we switched to dns in most parts to allow restarts ... can this be a kubernets limitation or are we missing something?

I think this is Docker related. The kubernetes cluster itself uses DNS but it will not get there as kubeadm is launched by systemd with configuration files using IPs only. These files are generated by kind during installation. Thats why the cluster works after installation because the IPs are already assigned by Docker. After the restart new IPs are assigned unless the containers were originally ran with the --ip parameter. I found on stackoverflow.com lot of reports (for example) that by using the --ip parameter one can "fix" the IP and permanently register it in DNS for example.

@BenTheElder
Copy link
Member

These files are generated by kind during installation.

not strictly.

this is a bug in

# fixup IPs in manifests ...

@BenTheElder
Copy link
Member

I've patched kind Docker provider so it (currently) generates sequential IP addresses for each node and forces them with docker ... --ip XXX.

Actually, docker does not guarantee this unless you create a network that excludes the chosen IPs from the auto-allocated range.
In order to do that we need to predict how much users need for kind versus containers alongside kind, and it generally complicates things. Even then it's only best effort, and not guaranteed. Docker does not provide guaranteed static IP allocation.

You can see more on this in the previous discussion in this repo.

@BenTheElder
Copy link
Member

This also smells like a bug somewhere between kubeadm and kind, there's no reason I shouldn't be able to put the APIserver behind DNS and have components respect that.

@hadrabap
Copy link
Author

hadrabap commented Feb 1, 2021

I hope I understand it better now. Let me summarize and please correct me if I'm wrong:

  1. Kind installs cluster with all it initializations and introduces IP addresses into /kind/ and /etc/kubernets/ directories.
  2. Each time the cluster starts the entrypoint tries to manage the IP addresses somehow.

What I've found is that the second step does something silly. Originally I made myself a shell script which re-configures the IP addresses to mach the current state, but that fails as well as all the security certificates generated in step 1 are based on IP addresses in their CNs. Which leads to a situation that the services are finally able to contact to each other but they reject the certificates and we are back in square one.

I see only two ways to solve this issue permanently:

  1. Use static IP addresses (which is problematic to do with Docker), or
  2. use host names everywhere from the total beginning.

I hope I got the idea.

@BenTheElder
Copy link
Member

Yeah I think that's pretty much it.

We could maybe fix 2) from the first list by regenerating the certs potentially (bit of a headache + at bootup kind binary orchestrated this between nodes)

I think we could also take some additional approaches re: the second list:

  1. setting aside etcd, there are the node IPs (not a problem, core components are not listening on these, the nodes just need to report them) and the apiserver (this is a problem), for the apiserver if we couldn't do DNS we could do a VIP (has it's own problems) similar to the existing in-cluster kubernetes.default backing IP. and similar for etcd perhaps.

Regarding 1.) we could also limit this to just control plane nodes. One additional problematic thing for kind is that we support provisioning N clusters in parallel in separate invocations against the same docker. Currently some users depend on this in an even worse extended circumstance: the dockerd is not on the host the kind binary is on, so coordinating IPAM will be a headache.


I think the cleanest solution is getting everything to use DNS names, but I don't remember why that's not happening right now. I'm not sure how soon I can dig into this deeper ... lots going on right now.

@BenTheElder
Copy link
Member

And to further clarify: the entrypoint IP update is intended to handle places where we bind to an IP in addition to the node reporting it's IP via kubelet. That intention should be fine, because certs should also be signed for the hostname and the hostname should be correctly mapped to the new IP. So the problem with that approach is things still connecting to IPs instead of domains, if we can fix that we still want to update the local node's references to it's old IP on restart.

@BenTheElder BenTheElder added the priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. label Feb 3, 2021
@aojea
Copy link
Contributor

aojea commented Feb 3, 2021

but that fails as well as all the security certificates generated in step 1 are based on IP addresses in their CNs. Which leads to a situation that the services are finally able to contact to each other but they reject the certificates and we are back in square one

do we know exactly which components are failing or ALL the component are failing?
kubelet nodes registration? apiserver, controller-manager,

@neolit123 do you have the case in kubeadm of people switching IPs on their cluster once installed?

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 4, 2021
@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 12, 2021
@markush81
Copy link

/remove-lifecycle stale

@BenTheElder BenTheElder changed the title Kubernetes cluster does not start after Docker re-assigns node's IP addresses after (Docker) restart multi-node: Kubernetes cluster does not start after Docker re-assigns node's IP addresses after (Docker) restart Jul 1, 2021
@BenTheElder
Copy link
Member

One thing to clarify: It's perfectly fine for the nodes to have changing IPs and be fixing that up in some places in the config, notably we have kubelet registering and setting the node IP on the node object, which is used for things like routing traffic to blocks of pod IPs, which is fine to do dynamically.

What is problematic is the parts that use fixed certificates / contacting the control plane components. All of that should be using DNS, but it's not.

Unfortunately rebootable multi-node clusters are something with a somewhat limited (but certainly not none!) use case and personally this is not a priority versus other work (mostly Kubernetes things which is not even kind related to begin with at the moment...)

The bot will not close issues in this repo now, I've disabled it for us.

A good start if someone wants to see progress on this would be identifying where IP addresses are being used.

@gagipro
Copy link

gagipro commented Jul 2, 2021

Hi,

thanks for disabling the bot !

the use case is simple : you work on a project and need to have a stable multinode env. You need to rebuild the cluster each day or each reboot.

The goal of kind is to simulate an env, but if one needs to rebuild each time you reboot, it is clearly a game breaker for kind.

I stopped using kind and built a real cluster, until issue solved.

thanks.
regards.

@BenTheElder
Copy link
Member

BenTheElder commented Jul 2, 2021

Hi, yes if a persistent "real" cluster suits your needs kind is not what you're looking for.

The goal of kind is to simulate an env, but if one needs to rebuild each time you reboot, it is clearly a game breaker for kind.

Regarding goals, see: https://kind.sigs.k8s.io/docs/contributing/project-scope/

For application testing long persistence is a nice to have. kind is not intended to be a persistent workload cluster and it would be insecure to keep it permanently (persistent clusters should be upgraded regularly to keep ahead of security fixes ...)

See also: https://kind.sigs.k8s.io/docs/user/configuration/#api-server

Most apps should not need multi-node at all to test, we need that to test some particular Kubernetes expectations around rolling node behavior for a few tests.

@pablodgonzalez
Copy link

No news here, Is a pity! It is neccesary in many kind of context such as making/teaching/learning curses for example

@aojea
Copy link
Contributor

aojea commented Dec 20, 2021

No news here, Is a pity! It is neccesary in many kind of context such as making/teaching/learning curses for example

can you expand on that?

Why do you need to restart docker?
why is not possible to create a cluster from scratch?

@pablodgonzalez
Copy link

Of course, just imagine

I start a course that lasts 2 days (when it could well be 5),

  • I created the cluster
  • I created pods, replica sets, implementations, services, etc ... each one with their yml configuration file
  • I Uploaded custom images to cluster
  • I get the cluster in a certain state ...
    And yes, the day is finished and I turn off the laptop
    So ... the next morning I have to recreate all the work from yesterday to continue with today's course ... just because on reboot I lost connectivity to the cluster and can't get back online.
    Just with a workaround available could it be fine.
    But for now to teach I have to first explain VirtualBox (or something else) to use minikube
    And yes I like delete all with kind and rebuild all again is good practice, but no for long courses which many times between days I am explain a feature

I think kind is almost perfect to teach (and learn) but this issue continue being a little headache

Best Regards

@aojea
Copy link
Contributor

aojea commented Dec 20, 2021

I think kind is almost perfect to teach (and learn) but this issue continue being a little headache

There are some people that have implemented creative solutions to workaround it, I think that some of them shared them in the slack channels, using some bash scripts with docker pause and other commands ...
maybe is time to look for something more standard, I have to check how to solve the IP assignment problem 🤔

@aojea
Copy link
Contributor

aojea commented Dec 21, 2021

  • I created the cluster

@pablodgonzalez this happens with multinode cluster ,right?
I'm able to stop and restart docker and keep working with single-node clusters

@pablodgonzalez
Copy link

@aojea Yes! the issue is with multinode HA cluster with 2 or more control-planes.
I thoughi It is because the haproxy image does not support the restart, the problem is the haproxy lost the access to control-plane containers. I do not research in deep. I just saw the logs returning un bad healtcheck.

@aojea
Copy link
Contributor

aojea commented Dec 21, 2021

I see, because multinode works after restarts too

$ docker inspect -f '{{.Name}} - {{range .NetworkSettings.Networks}}{{.IPAddress}}{{end}}' $(docker ps -aq)
/kind-control-plane - 172.18.0.4
/kind-worker - 172.18.0.6
/kind-worker2 - 172.18.0.5
/vigilant_ptolemy - 172.18.0.2
/trusting_tharp - 172.18.0.3
$ sudo systemctl restart docker
$ docker inspect -f '{{.Name}} - {{range .NetworkSettings.Networks}}{{.IPAddress}}{{end}}' $(docker ps -aq)
/kind-control-plane - 172.18.0.3
/kind-worker - 172.18.0.4
/kind-worker2 - 172.18.0.2
$ kubectl get pods -A
NAMESPACE            NAME                                         READY   STATUS    RESTARTS       AGE
kube-system          coredns-78fcd69978-cfck2                     1/1     Running   1 (110s ago)   6m23s
kube-system          coredns-78fcd69978-nl848                     1/1     Running   1 (110s ago)   6m23s
kube-system          etcd-kind-control-plane                      1/1     Running   0              97s
kube-system          kindnet-4mxrm                                1/1     Running   1 (110s ago)   6m23s
kube-system          kindnet-qj2f5                                1/1     Running   1 (110s ago)   6m4s
kube-system          kindnet-tbsrh                                1/1     Running   1 (111s ago)   6m5s
kube-system          kube-apiserver-kind-control-plane            1/1     Running   0              97s
kube-system          kube-controller-manager-kind-control-plane   1/1     Running   1 (110s ago)   6m36s
kube-system          kube-proxy-9kkcx                             1/1     Running   1 (110s ago)   6m4s
kube-system          kube-proxy-fxc8w                             1/1     Running   1 (110s ago)   6m5s
kube-system          kube-proxy-p7qf4                             1/1     Running   1 (110s ago)   6m23s
kube-system          kube-scheduler-kind-control-plane            1/1     Running   1 (110s ago)   6m43s
local-path-storage   local-path-provisioner-85494db59d-6sh9l      1/1     Running   2 (61s ago)    6m23s

what is the requirement for HA?

@kubernetes-sigs kubernetes-sigs deleted a comment from k8s-ci-robot Dec 21, 2021
@pablodgonzalez
Copy link

Mainly I use it to show how to responde the cluster when a control-plane fail, the multi node is simple to show but control-plane is a extra.
I always recommend make the cluster in HA when is not using cloud providers managed instance, so the excercice to view the info in the containers and show how to works it is the cherry over the pie
So working always in HA cluster I get the attention and curiosity about the feature

@aojea
Copy link
Contributor

aojea commented Dec 21, 2021

ok, the problem is that the etcd cluster has the IPs hardcoded

  annotations:
    kubeadm.kubernetes.io/etcd.advertise-client-urls: https://172.18.0.2:2379
  creationTimestamp: null
  labels:
    component: etcd
    tier: control-plane
  name: etcd
  namespace: kube-system
spec:
  containers:
  - command:
    - etcd
    - --advertise-client-urls=https://172.18.0.2:2379
    - --cert-file=/etc/kubernetes/pki/etcd/server.crt
    - --client-cert-auth=true
    - --data-dir=/var/lib/etcd
    - --initial-advertise-peer-urls=https://172.18.0.2:2380
    - --initial-cluster=kind-control-plane=https://172.18.0.2:2380
    - --key-file=/etc/kubernetes/pki/etcd/server.key
    - --listen-client-urls=https://127.0.0.1:2379,https://172.18.0.2:2379
    - --listen-metrics-urls=http://127.0.0.1:2381
    - --listen-peer-urls=https://172.18.0.2:2380

hence, when the cluster reboots and the nodes change their ips it blows up
/cc @neolit123

@neolit123
Copy link
Member

neolit123 commented Dec 21, 2021 via email

@aojea
Copy link
Contributor

aojea commented Dec 21, 2021

Changing the IPs of CP nodes would require updating static pods on disk, and also changing annotations on the mirror pods. Perhaps there is a way to reserve Docker IPs on restart instead of adapting to IP changes.

and using dns names instead of IPs?

@neolit123
Copy link
Member

kube-* components have flags that only work with IPs. Those could be set to 0.0.0.0, but the mirror pod annotations still have to be updated post ip detection.
Kubeadm uses static etcd bootstrap, which uses IPs. DNS bootstrap is not supported by kubeadm woth a toggle, but users can opt into it using flags.

https://etcd.io/docs/v3.5/dev-internal/discovery_protocol/

@boldandbusted
Copy link

Of course, just imagine

I start a course that lasts 2 days (when it could well be 5),

  • I created the cluster
  • I created pods, replica sets, implementations, services, etc ... each one with their yml configuration file
  • I Uploaded custom images to cluster
  • I get the cluster in a certain state ...
    And yes, the day is finished and I turn off the laptop
    So ... the next morning I have to recreate all the work from yesterday to continue with today's course ... just because on reboot I lost connectivity to the cluster and can't get back online.
    Just with a workaround available could it be fine.
    But for now to teach I have to first explain VirtualBox (or something else) to use minikube
    And yes I like delete all with kind and rebuild all again is good practice, but no for long courses which many times between days I am explain a feature

I think kind is almost perfect to teach (and learn) but this issue continue being a little headache

Best Regards

Howdy. I was just pointed here from Slack because I had a KinD cluster that has two control-plane nodes, and was perplexed at why it didn't survive a reboot of the host OS. However, I saw your comment, and while it is a bit of a pain to set up first-day, you and your students could use Vagrant to manage suspending and bringing up a VM with a KinD cluster in the guest OS as a workaround. I shared what I use to manage setup here: https://github.com/boldandbusted/vagrant-kind . Cheers.

@pablodgonzalez
Copy link

@boldandbusted Thanks for share your repo but the idea is avoid install any VMs or take time for explain another tools.
This is for the target public, many students are developers, architects, and sometimes decision makers, so, I want focus on kubernetes and his benefits and not get noise from another tools or setups.
For now I got a config for multi control plane node to use over the course ending but it lost the enchant of working hands on from the beginning and discover it for your self

@BenTheElder
Copy link
Member

BenTheElder commented May 3, 2022

Unfortunately it's been difficult to track this issue as discussion has veered off topic and across many issues/threads.

See: #2671 for recent discussion on possible solutions.

Some discussions are in #2045

@BenTheElder
Copy link
Member

This should be fixed for most multi-node clusters in the latest sources at HEAD, and in the forthcoming v0.15.0 (TBD, we'll want to wrap up some other things and make sure this is working widely before cutting a release).

#1689 remains for tracking clusters with multiple control-plane nodes ("HA") which we haven't dug into yet.

@BenTheElder BenTheElder assigned BenTheElder and unassigned aojea May 26, 2022
@BenTheElder
Copy link
Member

(Thanks @tnqn !)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete.
Projects
None yet
Development

No branches or pull requests

10 participants