Pod doesn't come up when pod cidr prefix is /24 #2102

kahou82 · 2018-08-06T23:05:52Z

My pod cannot come up when I config calico pod cidr with prefix /24 (I have tried different cidr such as 192.168.224.0/24, 192.168.0.0/24, etc)

Expected Behavior

I am not sure if a /24 is supported or not so I am not sure the expected behavior

Current Behavior

From the calico-node pod, I see the following error:
2018-08-06 23:01:08.025 [WARNING][69] node.go 224: Node kahou-tenant-test-3-worker312e336341 does not have podCIDR to use to calculate the IPIP Tunnel Address
2018-08-06 23:01:09.058 [INFO][67] health.go 150: Overall health summary=&health.HealthReport{Live:true, Ready:true}
2018-08-06 23:01:10.149 [WARNING][69] node.go 224: Node kahou-tenant-test-3-worker83211adf2c does not have podCIDR to use to calculate the IPIP Tunnel Address
2018-08-06 23:01:13.942 [INFO][67] health.go 150: Overall health summary=&health.HealthReport{Live:true, Ready:true}

Possible Solution

Steps to Reproduce (for bugs)

Setup a k8s cluster (I am using 1.10)
Setup calico cni with pod cidr as 192.168.224.0/24
When the cluster is ready, spin up a pod. You will see the pod stuck as "ContainerCreating" stage
Go to calico pod log and you will the the above error.

Context

Your Environment

Calico version
Orchestrator version (e.g. kubernetes, mesos, rkt):
Operating System and version:
Link to your project (optional):

ozdanborne · 2018-08-09T18:05:50Z

/24 cidrs are valid.

@kahou82 to see why a pod is stuck in ContainerCreating please run kubectl describe pod on the pod.

kahou82 · 2018-08-11T17:19:42Z

@ozdanborne I see the following error from calico node pod:

2018-08-11 17:18:26.038 [INFO][70] int_dataplane.go 747: Finished applying updates to dataplane. msecToApply=1.164562
2018-08-11 17:18:27.876 [INFO][70] health.go 150: Overall health summary=&health.HealthReport{Live:true, Ready:true}
2018-08-11 17:18:30.435 [INFO][70] health.go 150: Overall health summary=&health.HealthReport{Live:true, Ready:true}
2018-08-11 17:18:31.733 [WARNING][68] node.go 224: Node calico-123-worker8fff5bb127 does not have podCIDR to use to calculate the IPIP Tunnel Address
2018-08-11 17:18:32.292 [WARNING][68] node.go 224: Node calico-123-workerf16295f162 does not have podCIDR to use to calculate the IPIP Tunnel Address
2018-08-11 17:18:36.940 [INFO][70] int_dataplane.go 733: Applying dataplane updates
2018-08-11 17:18:36.940 [INFO][70] ipsets.go 222: Asked to resync with the dataplane on next update. family="inet"
2018-08-11 17:18:36.940 [INFO][70] ipsets.go 253: Resyncing ipsets with dataplane. family="inet"
2018-08-11 17:18:36.941 [INFO][70] ipsets.go 295: Finished resync family="inet" numInconsistenciesFound=0 resyncDuration=929.699µs
2018-08-11 17:18:36.941 [INFO][70] int_dataplane.go 747: Finished applying updates to dataplane. msecToApply=1.110725
2018-08-11 17:18:37.408 [INFO][70] int_dataplane.go 733: Applying dataplane updates
2018-08-11 17:18:37.408 [INFO][70] route_table.go 189: Queueing a resync of routing table. ipVersion=0x4
2018-08-11 17:18:37.408 [INFO][70] int_dataplane.go 747: Finished applying updates to dataplane. msecToApply=0.22300099999999998
2018-08-11 17:18:37.876 [INFO][70] health.go 150: Overall health summary=&health.HealthReport{Live:true, Ready:true}
2018-08-11 17:18:40.435 [INFO][70] health.go 150: Overall health summary=&health.HealthReport{Live:true, Ready:true}
2018-08-11 17:18:41.752 [WARNING][68] node.go 224: Node calico-123-worker8fff5bb127 does not have podCIDR to use to calculate the IPIP Tunnel Address
2018-08-11 17:18:42.301 [WARNING][68] node.go 224: Node calico-123-workerf16295f162 does not have podCIDR to use to calculate the IPIP Tunnel Address

tmjd · 2018-08-13T14:40:52Z

Setup calico cni with pod cidr as 192.168.224.0/24
How are you doing that step?

From the logs you pasted I believe you are using the Kubernetes API datastore, in that mode Kubernetes is responsible for allocating the podCIDR for each node. Are you specifying the pod cidr range to Kubernetes and have the flag set that will have it allocate CIDRs to the nodes?
See the first step here https://docs.projectcalico.org/v3.1/getting-started/kubernetes/installation/calico#installing-with-the-kubernetes-api-datastore50-nodes-or-less
If you do have that specified to K8s I think you should look at the Kubernetes controller manager logs and see if has any errors around the CIDR.

kahou82 · 2018-08-14T20:20:17Z

@tmjd I am using kubeadm and set --pod-network-cidr=192.168.224.0/24. Looks like pod-network-cidr will set both cluster-cidr and allocate-node-cidrs?

KillerX · 2018-08-16T10:51:56Z

@kahou82 I have the same problem as you and it appears to be originating from the fact that the control plane is assigning /24 networks to nodes. See: Azure/acs-engine#1573 (comment)

But while now the "does not have podCIDR" issue is gone the pod is still crashing with the same error in pods describe:

Events:
  Type     Reason     Age                From         Message
  ----     ------     ----               ----         -------
  Normal   Pulled     12m                kubelet, k1  Container image "quay.io/calico/cni:v3.2.0" already present on machine
  Normal   Created    12m                kubelet, k1  Created container
  Normal   Started    12m                kubelet, k1  Started container
  Normal   Started    11m (x2 over 12m)  kubelet, k1  Started container
  Normal   Pulled     10m (x3 over 12m)  kubelet, k1  Container image "quay.io/calico/node:v3.2.0" already present on machine
  Normal   Created    10m (x3 over 12m)  kubelet, k1  Created container
  Normal   Killing    10m (x2 over 11m)  kubelet, k1  Killing container with id docker://calico-node:Container failed liveness probe.. Container will be killed and recreated.
  Warning  Unhealthy  7m (x26 over 12m)  kubelet, k1  Liveness probe failed: Get http://localhost:9099/liveness: dial tcp [::1]:9099: connect: connection refused
  Warning  BackOff    2m (x12 over 5m)   kubelet, k1  Back-off restarting failed container

This is happening on both the master and a node that I joined. This are the last lines in the pod log:

2018-08-16 10:44:11.834 [INFO][45] int_dataplane.go 734: Applying dataplane updates
2018-08-16 10:44:11.834 [INFO][45] ipsets.go 223: Asked to resync with the dataplane on next update. family="inet"
2018-08-16 10:44:11.835 [INFO][45] ipsets.go 254: Resyncing ipsets with dataplane. family="inet"
2018-08-16 10:44:11.839 [INFO][45] ipsets.go 304: Finished resync family="inet" numInconsistenciesFound=0 resyncDuration=4.143808ms
2018-08-16 10:44:11.840 [INFO][45] int_dataplane.go 748: Finished applying updates to dataplane. msecToApply=5.457275999999999
2018-08-16 10:44:18.616 [INFO][45] health.go 150: Overall health summary=&health.HealthReport{Live:true, Ready:true}
2018-08-16 10:44:22.681 [INFO][45] int_dataplane.go 734: Applying dataplane updates
2018-08-16 10:44:22.681 [INFO][45] ipsets.go 223: Asked to resync with the dataplane on next update. family="inet"
2018-08-16 10:44:22.681 [INFO][45] ipsets.go 254: Resyncing ipsets with dataplane. family="inet"
2018-08-16 10:44:22.684 [INFO][45] ipsets.go 304: Finished resync family="inet" numInconsistenciesFound=0 resyncDuration=2.555223ms
2018-08-16 10:44:22.684 [INFO][45] int_dataplane.go 748: Finished applying updates to dataplane. msecToApply=3.393957
2018-08-16 10:44:28.586 [INFO][45] health.go 150: Overall health summary=&health.HealthReport{Live:true, Ready:true}

I would be very grateful if anyone can provide more insight into what could be causing this or how we might be able to get access to more information about what is going on.

caseydavenport · 2018-08-16T16:23:00Z

Warning Unhealthy 7m (x26 over 12m) kubelet, k1 Liveness probe failed: Get http://localhost:9099/liveness: dial tcp [::1]:9099: connect: connection refused

Looks like the liveness probe is failing. Not sure, but might be because it's picking an IPv6 localhost while the node container is using IPv4?

KillerX · 2018-08-16T17:45:05Z

@caseydavenport I have rebuilt the master from scratch and replaced the localhost in the yaml file with 127.0.0.1 to force IPv4 but the result remains unchanged:

Warning  Unhealthy  13s (x6 over 1m)  kubelet, k1  Liveness probe failed: Get http://127.0.0.1:9099/liveness: dial tcp 127.0.0.1:9099: connect: connection refused

And after this the node enters a kill/start loop again.

Ill dig around some more and open a new issue if I cant figure anything out as the problem I'm facing now seems unrelated to the original issue. Thanks for helping so far!

caseydavenport · 2018-08-16T20:24:56Z

Cool - I'll close this for now since we've gotten past the original issue.

Do raise another if needed!

ozdanborne added the kind/support label Aug 9, 2018

caseydavenport closed this as completed Aug 16, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pod doesn't come up when pod cidr prefix is /24 #2102

Pod doesn't come up when pod cidr prefix is /24 #2102

kahou82 commented Aug 6, 2018

ozdanborne commented Aug 9, 2018

kahou82 commented Aug 11, 2018

tmjd commented Aug 13, 2018

kahou82 commented Aug 14, 2018 •

edited

Loading

KillerX commented Aug 16, 2018

caseydavenport commented Aug 16, 2018

KillerX commented Aug 16, 2018

caseydavenport commented Aug 16, 2018

Pod doesn't come up when pod cidr prefix is /24 #2102

Pod doesn't come up when pod cidr prefix is /24 #2102

Comments

kahou82 commented Aug 6, 2018

Expected Behavior

Current Behavior

Possible Solution

Steps to Reproduce (for bugs)

Context

Your Environment

ozdanborne commented Aug 9, 2018

kahou82 commented Aug 11, 2018

tmjd commented Aug 13, 2018

kahou82 commented Aug 14, 2018 • edited Loading

KillerX commented Aug 16, 2018

caseydavenport commented Aug 16, 2018

KillerX commented Aug 16, 2018

caseydavenport commented Aug 16, 2018

kahou82 commented Aug 14, 2018 •

edited

Loading