-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Calico's Typha pods should prefer to run on masters, but tolerate running elsewhere #9608
Comments
@KashifSaadat what do you think? |
Note that we should probably also accommodate the more modern node label used by kubedam: node-role.kubernetes.io/master, just mandating that it exists, its value being immaterial. |
Thank you for the fast resolution, @hakman! |
Sorry for the delayed response! I ran into a similar issue and so happy with the approach taken and the fix that has been merged in. 👍 |
@KashifSaadat should I add this change in the Canal manifest also? At the moment it is only in Calico. |
Yes please that would be great 👌 |
What is the reason or use case for nodeSelect'ing or setting affinity to the master nodes for typha? |
In clusters with a lot of worker node churn—especially with the cluster autoscaler dropping ASGs down to zero instances—placing Typha on the masters keeps the pods running consistently without holding up worker nodes unnecessarily. |
What do you mean by
To clear up a possible point of confusion here, Typha temporarily being unavailable would not disrupt any pod traffic. The only thing that would be disrupted would be new policy changes would be delayed until a new Typha pod was running. |
I've seen cases where the cluster autoscaler can't drop a worker node because a Typha pod is running on it. That Typha pod could run on one of the master nodes that are not subject to the autoscaler's control. The safe-to-evict annotation didn't work as expected. We wound up paying to run that worker node unnecessarily. Ideally we'd scale the Typha Deployment with something like the horizontal proportional autoscaler based on the number of nodes in the cluster. |
It sounds like this is just working around the safe-to-evict annotation not working. Was there a bug submitted for kops about the safe-to-evict annotation not working? I tried searching for one and didn't find one that sounded like what you described. |
No, not quite. We don't want Typha running on worker nodes that are less likely to stay up than the master nodes. We used a preference in our node affinity. We prefer but don't require that Typha run on the masters. Since you can only run one Typha pod per node anyway, once you need more than three Typha pods, you may—but probably don't—want more than three masters. At that point, placing them on worker nodes is fine. Regarding filing a defect, I wasn't using kops when I saw that problem with the cluster autoscaler failing to evict Typha's pods. I recall the problem having to do with the difference between the "critical" annotations and running the Typha pods at a high enough priority. At some point Tigera moved from using that deprecated annotation (maybe "scheduler.alpha.kubernetes.io/critical-pod") to using the "system-cluster-critical" priority class and the toleration for the "CriticalAddonsOnly" taint. Eviction worked before then, but I think the pod priority may be higher than the autoscaler will tolerate for its default eviction configuration. It's been about ten months since I last looked into it. |
Following up from #9240 (comment), we'd like to run more Calico Typha pods than we have master nodes, for clusters with 600 nodes or more. Unfortunately, we can't get more Typha pods scheduled than we have master nodes, even when we have other non-master nodes that could host Typha as well.
What
kops
version are you running?Version 1.18.0-beta.2 (git-63f9ae1099)
What Kubernetes version are you running?
What cloud provider are you using?
AWS EC2
What commands did you run? What is the simplest way to reproduce this issue?
What happened after the commands executed?
One of the Typha pods remains pending, not placed by the Kubernetes scheduler because each node can host at most one such pod, and there are fewer master nodes that Typha pods.
What did you expect to happen?
The first three Typha pods would be scheduled on the master nodes, but the fourth Typha pod would be scheduled on a different non-master node.
Anything else do we need to know?
Using a node affinity rule in the Typha Deployment's pod spec template would allow us to express a preference for Typha running on a master node, but not a strict requirement. In this case, we'd use a
preferredDuringSchedulingIgnoredDuringExecution
entry with a match expression adapted from the existing node selector, and we'd remove the node selector.The text was updated successfully, but these errors were encountered: