Skip to content
This repository has been archived by the owner on Sep 30, 2020. It is now read-only.

Tweaks to network config #1407

Closed
cknowles opened this issue Jul 15, 2018 · 4 comments
Closed

Tweaks to network config #1407

cknowles opened this issue Jul 15, 2018 · 4 comments
Assignees
Labels
kind/feature Categorizes issue or PR as related to a new feature.

Comments

@cknowles
Copy link
Contributor

cknowles commented Jul 15, 2018

While looking at some issues with CNI and hostPort connections mentioned in #704 (comment), I found some discrepancies between the kube-aws network configuration and what's expected or perhaps optimal. I've since discovered the cause of my hostPort issues was something unrelated but the items I found are likely worth including in kube-aws. I'm going to list them here first and we can always split them up.

  1. We are not setting the MTU for Calico in the CNI config. It means that for Canal the MTU of the flannel interface is 8951 but all the Calico interfaces are 1500. It still works but I believe it's not the optimal value. According to the Calico docs, we should set this to 8951 to match flannel:

When using flannel for networking, the MTU for the network interfaces should match the MTU of the flannel interface. In the above table the 4th column “Calico MTU with VXLAN” is the expected MTU when using flannel configured with VXLAN.

  1. We should set externalSetMarkChain to KUBE-MARK-MASQ in the CNI config to reuse existing iptables chains. It's not in the default YAMLs but I found it here. It seems our friend redbaron found the same

externalSetMarkChain - string, default nil. If you already have a Masquerade mark chain (e.g. Kubernetes), specify it here. This will use that instead of creating a separate chain. When this is set, markMasqBit must be unspecified.

  1. We should set the --service-cluster-ip-range flag on controller manager as it prevents IPs being assigned in the service CIDR range should it overlap with the pod CIDR range. This doesn't seem to be documented anywhere I could find, but most bootstrap instructions set it and I just found we didn't set it. Dug into the code and found this.

  2. When using Canal or Flannel, we should be defaulting the podCIDR to 10.244.0.0/16. When using Calico it should be 192.168.0.0/16. References - Calico, Canal, kubeadm instructions and code. I'm wondering what the impact of setting these to anything else is as our default right now is 10.2.0.0/16 and it appears to work for basic functionality. Perhaps the impact is in network policy enforcement?

  3. We should update cniVersion from 0.3.0 to 0.3.1

@cknowles cknowles self-assigned this Jul 15, 2018
@cknowles cknowles added the kind/feature Categorizes issue or PR as related to a new feature. label Jul 15, 2018
cknowles pushed a commit to cknowles/kube-aws that referenced this issue Jul 15, 2018
Fixes kubernetes-retired#1407.

For now just document the `podCIDR` aspects but we should probably switch the default based on the networking setup, possibly combine this config with the `selfHosting` config into a networking section.
@davidmccormick
Copy link
Contributor

davidmccormick commented Jul 16, 2018

Hi

Good catch. I vote for all of the above, except 4 - which I don't rightly understand what is the significance of using one cidr range over another? Can't we decide our own cidr and then accommodate the defaults and kube manifests around that choice? What is the advantage of using 192.168.0.0 with calico?

We choose our own cidr ranges so I don't think we ever keep the default settings.

@cknowles
Copy link
Contributor Author

@davidmccormick I'm not sure at this stage about 4 and why every set of instructions I found seem to indicate it has to be set to that. At least on the surface our defaults work right now so I assume there's some other subtle issue or just everyone has copied the instructions verbatim 😆 (including kubeadm).

@davidmccormick
Copy link
Contributor

:) I think we should stick with the same default podCIDR unless we find hard evidence to change it.
All the others sound like we should merge asap.

@cknowles
Copy link
Contributor Author

cknowles commented Jul 17, 2018

Sure, we can at least do that and have the other bit ready.

One thing though, I'm struggling to identify the cause of a full cluster outage so asked for the PR to not be merged yet (see the message there but either related to this change or
#1397). Nodes started marking themselves a NotReady intermittently and sometimes NotScheduleable until gradually all of them fell over. A dev cluster with a few nodes did not experience that issue which is why we promoted the changes to a larger cluster with more use and that's where the problems started. Actually the dev cluster experienced it on one node just after we had successfully restored the downed cluster. Killing that one node and the dev cluster has been fine for another couple of days.

Surprisingly for the downed cluster a brand new cluster using a backup/restore via ark didn't work, the restored pods appeared to make the new cluster fragile in some way. Redeployed all from source and it was healthy. I suspect it was the CIDR change although it could be the newer Calico or the other network changes. I did follow the instructions from your recent PR supporting CIDR changes with cluster downtime. I found a few kube issues related to issues waiting for CNI plugins to respond which seem like they have similar symptoms.

We still have the node problem detector backlog ticket in kube-aws of course but that'll only patch over the root cause.

mumoshu pushed a commit that referenced this issue Sep 16, 2018
* Tweaks to network config

Fixes #1407.

For now just document the `podCIDR` aspects but we should probably switch the default based on the networking setup, possibly combine this config with the `selfHosting` config into a networking section.

* Correct pod CIDR notes
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/feature Categorizes issue or PR as related to a new feature.
Projects
None yet
Development

No branches or pull requests

2 participants