-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
make sure flanneld got QoS class "Guaranteed" to have lower oom_score_adj #855
Conversation
Should the limit be higher? Are these values suitable even for large clusters? |
@tomdee , since flanneld is running per node, and most clusters don't frequently create and destroy PODs, I guess 10m cpu and 50Mi memory are enough for small scale(~10 nodes) and medium scale(~100 nodes) Kubernetes clusters. I believe the administrators of large scale cluster are able to optimize these parameters :-D Maybe CoreOS has some test clusters to confirm the appropriate resource limits, or survey in some flannel mailing list. BTW, I did encounter Flanneld being OOM-killed by Linux kernel although it doesn't consume too much memory. |
I think this would be a great PR to merge but with the current values I'm worried that it will create (additional) hard to diagnose problems since the limits might be too low. |
Maybe 50m cpu and 100Mi memory is safer, I'm not sure about the best numbers, need some benchmark and survey. |
OK, my testing has shown that a 1000 node vxlan cluster (unfortunately with etcd backend) uses ~22MB. Using flannel with the k8s subnet manager adds about 10MB to the memory usage so I think a limit of 100m CPU and 50Mi memory would be OK. If you make this change to all the yamls in this repo I can merge it. |
…_adj Usually flanneld consumes 5m CPU and 15Mi memory according to "kubectl top". According to test by @tomdee at flannel-io#855 (comment), 50Mi is enough for a 1000 node vxlan cluster.
@tomdee Thank you very much for your testing, I just updated my PR and changed all the yamls in Documentation/. |
@Dieken 10-flannel.conflist should be read by CNI here Is there some error you are seeing? |
@osoriano Sorry I didn’t know the difference between .conf and .conflist, I thought it was a typo, hadn’t upgraded to the latest kube-flannel.yml. BTW, is that fix required to other yaml files in Documentation/ ? And maybe the initContainer need delete 10-flannel.conf for a smooth upgrade. |
@tomdee could you review the PR and merge it? The discussion in the last three comments are irrelated. |
@Dieken, thanks for pointing that out. The If both 10-flannel.conf and 10-flannel.conflist are both present, only 10-flannel.conf will be used (first file in the sorted list). You're right, the old file should be removed in an upgrade. Although, it won't break the upgrade either. I only tested the change with kube-flannel.yml, but it seems like others could be updated as well. |
@osoriano not exactly - Kubernetes prefers conflists over confs, regardless of the sorting. |
@squeed Can you help me understand where this happens in the code? I'm not that familiar with CNI, but saw that the conf files are first sorted, then we return on the first valid conf. Also, from the documentation: [1] https://kubernetes.io/docs/concepts/cluster-administration/network-plugins/ |
@osoriano argh, you're completely right.. And I even wrote that code. I was thinking of a different CNI loading shim I wrote for rkt :-). |
Usually flanneld consumes 5m CPU and 15Mi memory according to "kubectl top".
Description
A few sentences describing the overall goals of the pull request's commits.
Please include
Flannel pod gets default QoS class "BestEffort" when it doesn't specify cpu and memory resources,
this makes its oom_score_adj to be 1000 and thus is more likely killed first when free memory isn't enough.
This patch assigns same amount of cpu/mem request and amount, makes Flanneld pod gets QoS class "Guaranteed" and oom_score_adj -998 as described at https://kubernetes.io/docs/tasks/administer-cluster/out-of-resource/#node-oom-behavior.