-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] - Add ability to configure number of Typha Replicas #7181
[WIP] - Add ability to configure number of Typha Replicas #7181
Conversation
Hi @gjtempleton. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: gjtempleton The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me ... I left just the one comment .. Also, does this work with canal? .. Note, your'll need to bump the manifests versions to get this to rollout .. i believe here
``` | ||
networking: | ||
calico: | ||
typhaReplicas: 3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can an existing calico be upgraded to typha without incident? .. I'm guessing no and if so, perhaps just chunk in a helpful comment indicating.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll need to double check if it's still the case, but it used to be the case that they could (~ version 3.2 from memory) as the new calico node pods come up and see the new value for FELIX_TYPHAK8SSERVICENAME
they connect to Typha rather than directly to the APIServer.
I'll have a play around with that at some point over the next couple of days to confirm though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed that this is an important thing for us to test and confirm. I feel like quite a few people will be excited for this and jump in when they see it in the release notes!
/ok-to-test |
Thanks for the super-quick response and feedback, knew I'd missed something around the manifests! |
On the Canal front, I've just had to bring myself back up to speed with the current state of Canal, and it seems the docs don't have the same recommendations around Typha usage given Flannel is the bit performing the heavy lifting of networking whilst Calico is just the policy so there's no need for the fan out. Worth noting that I've put in |
/retest |
Hi. I just manually apply this changes on my cluster (kops 1.12.2/k8s v1.12.9). The commit uses the image calico/typha:v3.7.2. This version is not compatible with the rest of the manifest and versions of the calico nodes (calico/node:v3.4.0) causing problems to the startup process of typha (and the calico-nodes pods that are waiting for the service). The version 3.7.2 needs an extra set of CRDS and resources in the cluster rolebinding. (please check this to see the difference) After switching the image to calico/typha:v3.4.0-amd64 everything worked properly. By the way I tried this on a cluster with a working calico installation (without typha) and route reflectors. Finally, some errors I got on the first try:
|
Hi @semoac, Thanks for giving it a try and sorry about the issues you had, I made the assumption that #7051 will be merged before this is merged, and that will take care of the extra CRDs required etc. (I throw-away mentioned this in an earlier comment, but should have been clearer in the PR description, will add that now.) Good to know it worked for you once you downgraded the image. ETA: Stuck a WIP on it until fully tested to answer the question around migrations from an existing 0-Typha setup and the pre-requisite PR is merged. |
My mistake. Thanks for point that out for me. |
Adds the ability to configure the number of Typha replicas when using Calico CNI in 1.12+ to limit the impact of Calico on the APIServer and increase the scalability of the cluster. Also adds the ability to configure Typha's Prometheus config. Add Passing TyphaReplicas Validation Test
dce7003
to
fcc85e2
Compare
Adds the ability to configure the number of Typha replicas when using Calico CNI in 1.12+
to limit the impact of Calico on the APIServer and increase the scalability of the cluster.
Also adds the ability to configure Typha's Prometheus config.
Resolves #7158
Dependent on #7051 due to Typha image beingThis has now been merged. Just dependent on me getting around to testing the migration path now.3.7.2