-
Notifications
You must be signed in to change notification settings - Fork 218
Add network provider option and Calico support #714
Conversation
cmd/bootkube/render.go
Outdated
@@ -50,6 +50,8 @@ var ( | |||
serviceCIDR string | |||
selfHostKubelet bool | |||
cloudProvider string | |||
networkProvider string | |||
networkMTU int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we get away with a sane default here rather than a flag? I'd really like to avoid this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, I was on the fence about this. An appropriate default is 1440. Real users would pick the right value depending on the environment or network performance suffers. Since bootkube render
just produces valid example manifests (not ideal ones), perhaps we leave it to projects to generate assets that use the correct MTU. In my cases, I generate the right assets for the platform separately, so I'm not opinionated which way bootkube render
goes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed. flannel and Calico use different defaults, so setting it in one place breaks tests anyway. This means bootkube render
will create clusters that work using the default settings, but aren't necessarily getting optimal throughput. I think that's ok given that higher level tools do handle these cases.
cmd/bootkube/render.go
Outdated
return errors.New("Must specify --network-provider flannel or calico") | ||
} | ||
if renderOpts.networkProvider == asset.NetworkCalico && renderOpts.calicoNetworkPolicy { | ||
return errors.New("Cannot specify --experimental-calico-network-policy with --network-provider=calico, Calico supports network policy automatically.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we just drop this altogether and put this under network provider instead?
--network-provider={flannel,experimental-calico-network-policy, experimiental-calico-bgp}
?
I'm leaning toward the experimental because if we don't have tests - we could break it at any time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Experimental prefixes work for me. I prefer we say "calico" rather than "calico-bgp", since bgp is redundant. I'd also prefer to keep "canal-style" policy-only projects separate from CNI providers, long term it doesn't really belong here since its not (supposed to be) providing networking or really essential to bringing up a minimal control plane. Suppose some other network policy app comes along. Its an addon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
--network-provider={flannel,experimental-calico}
Should we add a brief pros/cons (/use-case) of flannel vs calico? For a newcomer I think it can be a bit hard to know what is "best". |
coreosbot run e2e calico |
coreosbot run e2e |
@klausenbusk to a degree, though maybe we just link to the relevant projects. So kubernetes-incubator projects stay clear of providing recommendations and leave it to users or to higher level tools to choose what's right for them. Practically speaking, this Calico support in bootkube is experimental so flannel is still the supported option. |
Re-enabling broken tests (#716) finds some issues in this PR :) I'll investigate. |
Squashed and rebased |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this is looking good - but I'd like to move the "calico network policy" under the same umbrella (as "canal")
cmd/bootkube/render.go
Outdated
@@ -72,6 +73,7 @@ func init() { | |||
cmdRender.Flags().StringVar(&renderOpts.serviceCIDR, "service-cidr", "10.3.0.0/24", "The CIDR range of cluster services.") | |||
cmdRender.Flags().BoolVar(&renderOpts.selfHostKubelet, "experimental-self-hosted-kubelet", false, "(Experimental) Create a self-hosted kubelet daemonset.") | |||
cmdRender.Flags().StringVar(&renderOpts.cloudProvider, "cloud-provider", "", "The provider for cloud services. Empty string for no provider") | |||
cmdRender.Flags().StringVar(&renderOpts.networkProvider, "network-provider", "flannel", "CNI network provider (flannel or experimental-calico).") | |||
cmdRender.Flags().BoolVar(&renderOpts.selfHostedEtcd, "experimental-self-hosted-etcd", false, "(Experimental) Create self-hosted etcd assets.") | |||
cmdRender.Flags().BoolVar(&renderOpts.calicoNetworkPolicy, "experimental-calico-network-policy", false, "(Experimental) Add network policy support by calico.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should drop this right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or really this should be moved under networkProvider={flannel, experimental-calico, experimental-canal}
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could do this and change the testing setup/scripts to work that way (Jenkins jobs pass enable_network_policy to terraform to scripts to bootkube render so the jobs need changes). It may be worth a separate PR.
Concerning whether we should, imo flannel with policy-only Calico (i.e. "canal") probably shouldn't be conflated with a top level CNI network provider. Policy-only Calico manifests can be deployed on top of a flannel cluster afterward, as an addon, no reason it needs to be part of control plane bootstrapping or a bootkube concern. I'd prefer we eventually remove --experimental-calico-network-policy
and just point to the upstream plain-old-manifests that work fine. Its the same logic that governs why bootkube clusters don't come with Ingress or CLUO or other optional addons.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd never recommend installing network policy after the fact (even if you could probably make it work). Essentially you'd end up relying on the name ordering of CNI configuration, (and that already makes me nervous), as well as deployed CNI plugin versions / compatibility. Basically -- not worth the risk (nor would we try and support that here). Networking IMO should only be considered an install-time choice.
To that end - I would much prefer that we have a single flag for the network provider rather than separating those concepts.
What about a separate PR (that comes before this) that changes the flag from:
--experimental-calico-network-policy=true
to
--network-provider=experimental-canal
Then this as a follow-up is merely adding a new provider?
Might also consider a minor version bump for this given the flag change. But given that it's experimental (and breaking changes should be expected), I could go either way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd never recommend installing network policy after the fact... not worth the risk
Calico policy-only manifests are jumbled up with control plane manifests and get created in random order. Their install-cni sidecar races with flannel's sidecar. Creating these manifests early shouldn't provide any sense of safety. In my view, any of these setups that involve flannel with policy-only calico are racy.
nor would we try and support that here
If anything, I want the opposite. Less support responsibility for bootkube. I'd prefer we point to upstream to (try to) handle policy-only mode since their docs cover adding it to a flannel cluster. Elevating it to a top level network provider in bootkube means we continue supporting it. That's fine if we wanna keep trying, its just not a mode I'd encourage users to choose myself.
Basically just gonna re-do this |
Force override old PR. |
coreosbot run e2e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, LGTM. One question about a bash debug option.
hack/quickstart/init-master.sh
Outdated
@@ -1,5 +1,5 @@ | |||
#!/usr/bin/env bash | |||
set -euo pipefail | |||
set -euox pipefail |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to leave the -x
option, or was that during debugging? Or if it's generally useful should we turn it on for init-node.sh
also?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was for debugging, I'll remove, though its a bit useful
* Add network_provider to e2e scripts / Terraform setup
Failed was related to the Jenkins testing log collection setup, which was different on the e2e-calico test. |
coreosbot run e2e |
coreosbot run e2e calico |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
name: var-run-calico | ||
readOnly: false | ||
- name: install-cni | ||
image: quay.io/calico/cni:v1.10.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should this be {{ .Images.CalicoCNI }}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should have been addressed by #726
--network-provider
)This is based on poseidon/bootkube-terraform, which uses Calico or flannel as the network provider on AWS, GCE, and bare-metal clusters (Digital Ocean w caveats). I figured it would be nice to have in the upstream project in case others want to use it.
rel #405
Closes #407