Skip to content
This repository was archived by the owner on Jul 30, 2021. It is now read-only.

Add network provider option and Calico support #714

Merged
merged 1 commit into from
Oct 2, 2017
Merged

Add network provider option and Calico support #714

merged 1 commit into from
Oct 2, 2017

Conversation

dghubble
Copy link
Contributor

@dghubble dghubble commented Sep 19, 2017

  • Support choosing Flannel or Calico networking (--network-provider)
  • On Jenkins, I've added a new test to provision a cluster with calico and perform the usual e2e tests.

This is based on poseidon/bootkube-terraform, which uses Calico or flannel as the network provider on AWS, GCE, and bare-metal clusters (Digital Ocean w caveats). I figured it would be nice to have in the upstream project in case others want to use it.

rel #405
Closes #407

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Sep 19, 2017
@@ -50,6 +50,8 @@ var (
serviceCIDR string
selfHostKubelet bool
cloudProvider string
networkProvider string
networkMTU int
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we get away with a sane default here rather than a flag? I'd really like to avoid this

Copy link
Contributor Author

@dghubble dghubble Sep 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, I was on the fence about this. An appropriate default is 1440. Real users would pick the right value depending on the environment or network performance suffers. Since bootkube render just produces valid example manifests (not ideal ones), perhaps we leave it to projects to generate assets that use the correct MTU. In my cases, I generate the right assets for the platform separately, so I'm not opinionated which way bootkube render goes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed. flannel and Calico use different defaults, so setting it in one place breaks tests anyway. This means bootkube render will create clusters that work using the default settings, but aren't necessarily getting optimal throughput. I think that's ok given that higher level tools do handle these cases.

return errors.New("Must specify --network-provider flannel or calico")
}
if renderOpts.networkProvider == asset.NetworkCalico && renderOpts.calicoNetworkPolicy {
return errors.New("Cannot specify --experimental-calico-network-policy with --network-provider=calico, Calico supports network policy automatically.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we just drop this altogether and put this under network provider instead?

--network-provider={flannel,experimental-calico-network-policy, experimiental-calico-bgp}?

I'm leaning toward the experimental because if we don't have tests - we could break it at any time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Experimental prefixes work for me. I prefer we say "calico" rather than "calico-bgp", since bgp is redundant. I'd also prefer to keep "canal-style" policy-only projects separate from CNI providers, long term it doesn't really belong here since its not (supposed to be) providing networking or really essential to bringing up a minimal control plane. Suppose some other network policy app comes along. Its an addon.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--network-provider={flannel,experimental-calico}

@klausenbusk
Copy link
Contributor

Should we add a brief pros/cons (/use-case) of flannel vs calico? For a newcomer I think it can be a bit hard to know what is "best".

@dghubble
Copy link
Contributor Author

coreosbot run e2e calico

@dghubble
Copy link
Contributor Author

coreosbot run e2e

@dghubble dghubble closed this Sep 20, 2017
@dghubble dghubble reopened this Sep 20, 2017
@dghubble
Copy link
Contributor Author

@klausenbusk to a degree, though maybe we just link to the relevant projects. So kubernetes-incubator projects stay clear of providing recommendations and leave it to users or to higher level tools to choose what's right for them.

Practically speaking, this Calico support in bootkube is experimental so flannel is still the supported option.

@dghubble
Copy link
Contributor Author

dghubble commented Sep 21, 2017

Re-enabling broken tests (#716) finds some issues in this PR :) I'll investigate.

@dghubble
Copy link
Contributor Author

Squashed and rebased

Copy link
Contributor

@aaronlevy aaronlevy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this is looking good - but I'd like to move the "calico network policy" under the same umbrella (as "canal")

@@ -72,6 +73,7 @@ func init() {
cmdRender.Flags().StringVar(&renderOpts.serviceCIDR, "service-cidr", "10.3.0.0/24", "The CIDR range of cluster services.")
cmdRender.Flags().BoolVar(&renderOpts.selfHostKubelet, "experimental-self-hosted-kubelet", false, "(Experimental) Create a self-hosted kubelet daemonset.")
cmdRender.Flags().StringVar(&renderOpts.cloudProvider, "cloud-provider", "", "The provider for cloud services. Empty string for no provider")
cmdRender.Flags().StringVar(&renderOpts.networkProvider, "network-provider", "flannel", "CNI network provider (flannel or experimental-calico).")
cmdRender.Flags().BoolVar(&renderOpts.selfHostedEtcd, "experimental-self-hosted-etcd", false, "(Experimental) Create self-hosted etcd assets.")
cmdRender.Flags().BoolVar(&renderOpts.calicoNetworkPolicy, "experimental-calico-network-policy", false, "(Experimental) Add network policy support by calico.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should drop this right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or really this should be moved under networkProvider={flannel, experimental-calico, experimental-canal}?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could do this and change the testing setup/scripts to work that way (Jenkins jobs pass enable_network_policy to terraform to scripts to bootkube render so the jobs need changes). It may be worth a separate PR.

Concerning whether we should, imo flannel with policy-only Calico (i.e. "canal") probably shouldn't be conflated with a top level CNI network provider. Policy-only Calico manifests can be deployed on top of a flannel cluster afterward, as an addon, no reason it needs to be part of control plane bootstrapping or a bootkube concern. I'd prefer we eventually remove --experimental-calico-network-policy and just point to the upstream plain-old-manifests that work fine. Its the same logic that governs why bootkube clusters don't come with Ingress or CLUO or other optional addons.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd never recommend installing network policy after the fact (even if you could probably make it work). Essentially you'd end up relying on the name ordering of CNI configuration, (and that already makes me nervous), as well as deployed CNI plugin versions / compatibility. Basically -- not worth the risk (nor would we try and support that here). Networking IMO should only be considered an install-time choice.

To that end - I would much prefer that we have a single flag for the network provider rather than separating those concepts.

What about a separate PR (that comes before this) that changes the flag from:

--experimental-calico-network-policy=true

to

--network-provider=experimental-canal

Then this as a follow-up is merely adding a new provider?

Might also consider a minor version bump for this given the flag change. But given that it's experimental (and breaking changes should be expected), I could go either way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd never recommend installing network policy after the fact... not worth the risk

Calico policy-only manifests are jumbled up with control plane manifests and get created in random order. Their install-cni sidecar races with flannel's sidecar. Creating these manifests early shouldn't provide any sense of safety. In my view, any of these setups that involve flannel with policy-only calico are racy.

nor would we try and support that here

If anything, I want the opposite. Less support responsibility for bootkube. I'd prefer we point to upstream to (try to) handle policy-only mode since their docs cover adding it to a flannel cluster. Elevating it to a top level network provider in bootkube means we continue supporting it. That's fine if we wanna keep trying, its just not a mode I'd encourage users to choose myself.

@dghubble
Copy link
Contributor Author

Basically just gonna re-do this

@dghubble
Copy link
Contributor Author

Force override old PR.

@dghubble
Copy link
Contributor Author

coreosbot run e2e

Copy link
Contributor

@aaronlevy aaronlevy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, LGTM. One question about a bash debug option.

@@ -1,5 +1,5 @@
#!/usr/bin/env bash
set -euo pipefail
set -euox pipefail
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to leave the -x option, or was that during debugging? Or if it's generally useful should we turn it on for init-node.sh also?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was for debugging, I'll remove, though its a bit useful

* Add network_provider to e2e scripts / Terraform setup
@dghubble
Copy link
Contributor Author

dghubble commented Oct 2, 2017

Failed was related to the Jenkins testing log collection setup, which was different on the e2e-calico test.

@dghubble
Copy link
Contributor Author

dghubble commented Oct 2, 2017

coreosbot run e2e

@dghubble
Copy link
Contributor Author

dghubble commented Oct 2, 2017

coreosbot run e2e calico

Copy link
Contributor

@diegs diegs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dghubble dghubble merged commit 347605b into kubernetes-retired:master Oct 2, 2017
@dghubble dghubble deleted the calico-networking branch October 2, 2017 19:00
name: var-run-calico
readOnly: false
- name: install-cni
image: quay.io/calico/cni:v1.10.0

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should this be {{ .Images.CalicoCNI }}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should have been addressed by #726

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants