Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use a dedicated etcd cluster for Cilium CNI #6496

Closed
olemarkus opened this issue Feb 20, 2019 · 9 comments · Fixed by #7646
Closed

Use a dedicated etcd cluster for Cilium CNI #6496

olemarkus opened this issue Feb 20, 2019 · 9 comments · Fixed by #7646

Comments

@olemarkus
Copy link
Member

Since Kops is locking down etcd with version 1.12, Cilium should start using a dedicated etcd cluster. One option is to use the cilium-etcd-operator that ships with Cilium 1.4. Example resources for provisioning this can be found here: https://github.com/cilium/cilium/tree/master/examples/kubernetes/1.11

@so0k
Copy link
Contributor

so0k commented Apr 10, 2019

the bundled addon is also still using cilium 1.0-stable instead of 1.4

EDIT: seems .Version been configurable since June 2018 - I thought it was not possible to use Cilium 1.4 in my our clusters but seems my assumptions were wrong

EDIT2: the issues we had with currently bundled Cilium templates are the ds missing features and rbac missing permissions.

the suggestion to use cilium-etcd-operator by default should come with a warning that this is meant for "small" clusters, doesn't consider IOPS requirements for etcd and etcd failures may result in seconds of network instability as cilium recovers the state into etcd

@Globegitter
Copy link
Contributor

Globegitter commented Apr 25, 2019

Yep would be great to have a working solution for the upcoming 1.12 that works with more than just "small" clusters.

cc @nebril

@olemarkus
Copy link
Member Author

The way forward would be to add cilium-etcd-operator to the ciliumkops add-on. The problem is that during cluster creation, only master nodes are available. And cilium-etcd-operator and the pods that they spawn currently don't have the tolerations necessary for them to be scheduled on master nodes. So that has to be added before anything can be done on the kops side.

For new clusters using etcd-manager, it won't be possible to use cilium at all because cilium running on normal nodes won't be able to talk to kops etcd anymore.

For existing clusters, we have managed to migrate to using cilium-etcd-operator just fine as then they can just be scheduled on normal nodes and we change the cilium k/v store before switching to etcd-manager.

@Globegitter
Copy link
Contributor

Ah that is good to know @olemarkus and you are not experiencing any of the performance/scalibility/reliability issue sthat @so0k is referring to?

And the tolerations issue is cilium/cilium-etcd-operator#42 I presume?

@Globegitter
Copy link
Contributor

Just had a chat with @tgraf and with cilium 1.6 there will also be CRD support: cilium/cilium#7573 which would make it possible to run without etcd when the cluster is below a certain size.

@olemarkus
Copy link
Member Author

Ah that is good to know @olemarkus and you are not experiencing any of the performance/scalibility/reliability issue sthat @so0k is referring to?

Performance-wise it should be more than enough. This is also what cilium use for scalability testing, I think.

Reliability ... we have seen etcd cluster has been recreated during rolling updates.This has been improving quite a bit though.

And the tolerations issue is cilium/cilium-etcd-operator#42 I presume?

Yes.

@tushar00jain
Copy link

tushar00jain commented Jul 7, 2019

I tried installing cilium with etcd operator manually using networking: CNI and it works fine, is there any downside to this? I hops cilium will provide an easy migration from etcd operator to CRDs though

@olemarkus
Copy link
Member Author

Given that virtually no resources match the ones that kops create anymore, I think it makes sense to set networking: cni. Have done the same to some of our clusters.

That will also ensure kops does not override these resources unexpectedly in the future.

@olemarkus
Copy link
Member Author

#7474

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants