Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pod doesn't come up when pod cidr prefix is /24 #2102

Closed
kahou82 opened this issue Aug 6, 2018 · 8 comments
Closed

Pod doesn't come up when pod cidr prefix is /24 #2102

kahou82 opened this issue Aug 6, 2018 · 8 comments

Comments

@kahou82
Copy link

kahou82 commented Aug 6, 2018

My pod cannot come up when I config calico pod cidr with prefix /24 (I have tried different cidr such as 192.168.224.0/24, 192.168.0.0/24, etc)

Expected Behavior

I am not sure if a /24 is supported or not so I am not sure the expected behavior

Current Behavior

From the calico-node pod, I see the following error:
2018-08-06 23:01:08.025 [WARNING][69] node.go 224: Node kahou-tenant-test-3-worker312e336341 does not have podCIDR to use to calculate the IPIP Tunnel Address
2018-08-06 23:01:09.058 [INFO][67] health.go 150: Overall health summary=&health.HealthReport{Live:true, Ready:true}
2018-08-06 23:01:10.149 [WARNING][69] node.go 224: Node kahou-tenant-test-3-worker83211adf2c does not have podCIDR to use to calculate the IPIP Tunnel Address
2018-08-06 23:01:13.942 [INFO][67] health.go 150: Overall health summary=&health.HealthReport{Live:true, Ready:true}

Possible Solution

Steps to Reproduce (for bugs)

  1. Setup a k8s cluster (I am using 1.10)
  2. Setup calico cni with pod cidr as 192.168.224.0/24
  3. When the cluster is ready, spin up a pod. You will see the pod stuck as "ContainerCreating" stage
  4. Go to calico pod log and you will the the above error.

Context

Your Environment

  • Calico version
  • Orchestrator version (e.g. kubernetes, mesos, rkt):
  • Operating System and version:
  • Link to your project (optional):
@ozdanborne
Copy link
Member

/24 cidrs are valid.

@kahou82 to see why a pod is stuck in ContainerCreating please run kubectl describe pod on the pod.

@kahou82
Copy link
Author

kahou82 commented Aug 11, 2018

@ozdanborne I see the following error from calico node pod:

2018-08-11 17:18:26.038 [INFO][70] int_dataplane.go 747: Finished applying updates to dataplane. msecToApply=1.164562
2018-08-11 17:18:27.876 [INFO][70] health.go 150: Overall health summary=&health.HealthReport{Live:true, Ready:true}
2018-08-11 17:18:30.435 [INFO][70] health.go 150: Overall health summary=&health.HealthReport{Live:true, Ready:true}
2018-08-11 17:18:31.733 [WARNING][68] node.go 224: Node calico-123-worker8fff5bb127 does not have podCIDR to use to calculate the IPIP Tunnel Address
2018-08-11 17:18:32.292 [WARNING][68] node.go 224: Node calico-123-workerf16295f162 does not have podCIDR to use to calculate the IPIP Tunnel Address
2018-08-11 17:18:36.940 [INFO][70] int_dataplane.go 733: Applying dataplane updates
2018-08-11 17:18:36.940 [INFO][70] ipsets.go 222: Asked to resync with the dataplane on next update. family="inet"
2018-08-11 17:18:36.940 [INFO][70] ipsets.go 253: Resyncing ipsets with dataplane. family="inet"
2018-08-11 17:18:36.941 [INFO][70] ipsets.go 295: Finished resync family="inet" numInconsistenciesFound=0 resyncDuration=929.699µs
2018-08-11 17:18:36.941 [INFO][70] int_dataplane.go 747: Finished applying updates to dataplane. msecToApply=1.110725
2018-08-11 17:18:37.408 [INFO][70] int_dataplane.go 733: Applying dataplane updates
2018-08-11 17:18:37.408 [INFO][70] route_table.go 189: Queueing a resync of routing table. ipVersion=0x4
2018-08-11 17:18:37.408 [INFO][70] int_dataplane.go 747: Finished applying updates to dataplane. msecToApply=0.22300099999999998
2018-08-11 17:18:37.876 [INFO][70] health.go 150: Overall health summary=&health.HealthReport{Live:true, Ready:true}
2018-08-11 17:18:40.435 [INFO][70] health.go 150: Overall health summary=&health.HealthReport{Live:true, Ready:true}
2018-08-11 17:18:41.752 [WARNING][68] node.go 224: Node calico-123-worker8fff5bb127 does not have podCIDR to use to calculate the IPIP Tunnel Address
2018-08-11 17:18:42.301 [WARNING][68] node.go 224: Node calico-123-workerf16295f162 does not have podCIDR to use to calculate the IPIP Tunnel Address

@tmjd
Copy link
Member

tmjd commented Aug 13, 2018

  1. Setup calico cni with pod cidr as 192.168.224.0/24
    How are you doing that step?

From the logs you pasted I believe you are using the Kubernetes API datastore, in that mode Kubernetes is responsible for allocating the podCIDR for each node. Are you specifying the pod cidr range to Kubernetes and have the flag set that will have it allocate CIDRs to the nodes?
See the first step here https://docs.projectcalico.org/v3.1/getting-started/kubernetes/installation/calico#installing-with-the-kubernetes-api-datastore50-nodes-or-less
If you do have that specified to K8s I think you should look at the Kubernetes controller manager logs and see if has any errors around the CIDR.

@kahou82
Copy link
Author

kahou82 commented Aug 14, 2018

@tmjd I am using kubeadm and set --pod-network-cidr=192.168.224.0/24. Looks like pod-network-cidr will set both cluster-cidr and allocate-node-cidrs?

@KillerX
Copy link

KillerX commented Aug 16, 2018

@kahou82 I have the same problem as you and it appears to be originating from the fact that the control plane is assigning /24 networks to nodes. See: Azure/acs-engine#1573 (comment)

But while now the "does not have podCIDR" issue is gone the pod is still crashing with the same error in pods describe:

Events:
  Type     Reason     Age                From         Message
  ----     ------     ----               ----         -------
  Normal   Pulled     12m                kubelet, k1  Container image "quay.io/calico/cni:v3.2.0" already present on machine
  Normal   Created    12m                kubelet, k1  Created container
  Normal   Started    12m                kubelet, k1  Started container
  Normal   Started    11m (x2 over 12m)  kubelet, k1  Started container
  Normal   Pulled     10m (x3 over 12m)  kubelet, k1  Container image "quay.io/calico/node:v3.2.0" already present on machine
  Normal   Created    10m (x3 over 12m)  kubelet, k1  Created container
  Normal   Killing    10m (x2 over 11m)  kubelet, k1  Killing container with id docker://calico-node:Container failed liveness probe.. Container will be killed and recreated.
  Warning  Unhealthy  7m (x26 over 12m)  kubelet, k1  Liveness probe failed: Get http://localhost:9099/liveness: dial tcp [::1]:9099: connect: connection refused
  Warning  BackOff    2m (x12 over 5m)   kubelet, k1  Back-off restarting failed container

This is happening on both the master and a node that I joined. This are the last lines in the pod log:

2018-08-16 10:44:11.834 [INFO][45] int_dataplane.go 734: Applying dataplane updates
2018-08-16 10:44:11.834 [INFO][45] ipsets.go 223: Asked to resync with the dataplane on next update. family="inet"
2018-08-16 10:44:11.835 [INFO][45] ipsets.go 254: Resyncing ipsets with dataplane. family="inet"
2018-08-16 10:44:11.839 [INFO][45] ipsets.go 304: Finished resync family="inet" numInconsistenciesFound=0 resyncDuration=4.143808ms
2018-08-16 10:44:11.840 [INFO][45] int_dataplane.go 748: Finished applying updates to dataplane. msecToApply=5.457275999999999
2018-08-16 10:44:18.616 [INFO][45] health.go 150: Overall health summary=&health.HealthReport{Live:true, Ready:true}
2018-08-16 10:44:22.681 [INFO][45] int_dataplane.go 734: Applying dataplane updates
2018-08-16 10:44:22.681 [INFO][45] ipsets.go 223: Asked to resync with the dataplane on next update. family="inet"
2018-08-16 10:44:22.681 [INFO][45] ipsets.go 254: Resyncing ipsets with dataplane. family="inet"
2018-08-16 10:44:22.684 [INFO][45] ipsets.go 304: Finished resync family="inet" numInconsistenciesFound=0 resyncDuration=2.555223ms
2018-08-16 10:44:22.684 [INFO][45] int_dataplane.go 748: Finished applying updates to dataplane. msecToApply=3.393957
2018-08-16 10:44:28.586 [INFO][45] health.go 150: Overall health summary=&health.HealthReport{Live:true, Ready:true}

I would be very grateful if anyone can provide more insight into what could be causing this or how we might be able to get access to more information about what is going on.

@caseydavenport
Copy link
Member

Warning Unhealthy 7m (x26 over 12m) kubelet, k1 Liveness probe failed: Get http://localhost:9099/liveness: dial tcp [::1]:9099: connect: connection refused

Looks like the liveness probe is failing. Not sure, but might be because it's picking an IPv6 localhost while the node container is using IPv4?

@KillerX
Copy link

KillerX commented Aug 16, 2018

@caseydavenport I have rebuilt the master from scratch and replaced the localhost in the yaml file with 127.0.0.1 to force IPv4 but the result remains unchanged:

Warning  Unhealthy  13s (x6 over 1m)  kubelet, k1  Liveness probe failed: Get http://127.0.0.1:9099/liveness: dial tcp 127.0.0.1:9099: connect: connection refused

And after this the node enters a kill/start loop again.

Ill dig around some more and open a new issue if I cant figure anything out as the problem I'm facing now seems unrelated to the original issue. Thanks for helping so far!

@caseydavenport
Copy link
Member

Cool - I'll close this for now since we've gotten past the original issue.

Do raise another if needed!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants