make sure flanneld got QoS class "Guaranteed" to have lower oom_score_adj #855

Dieken · 2017-10-26T05:35:33Z

Usually flanneld consumes 5m CPU and 15Mi memory according to "kubectl top".

Description

A few sentences describing the overall goals of the pull request's commits.
Please include

the type of fix - (e.g. bug fix, new feature, documentation)
some details on why this PR should be merged
the details of the testing you've done on it (both manual and automated)
which components are affected by this PR

Flannel pod gets default QoS class "BestEffort" when it doesn't specify cpu and memory resources,
this makes its oom_score_adj to be 1000 and thus is more likely killed first when free memory isn't enough.

This patch assigns same amount of cpu/mem request and amount, makes Flanneld pod gets QoS class "Guaranteed" and oom_score_adj -998 as described at https://kubernetes.io/docs/tasks/administer-cluster/out-of-resource/#node-oom-behavior.

tomdee · 2017-10-28T00:00:24Z

Should the limit be higher? Are these values suitable even for large clusters?

Dieken · 2017-10-30T04:12:14Z

@tomdee , since flanneld is running per node, and most clusters don't frequently create and destroy PODs, I guess 10m cpu and 50Mi memory are enough for small scale(~10 nodes) and medium scale(~100 nodes) Kubernetes clusters. I believe the administrators of large scale cluster are able to optimize these parameters :-D

Maybe CoreOS has some test clusters to confirm the appropriate resource limits, or survey in some flannel mailing list.

BTW, I did encounter Flanneld being OOM-killed by Linux kernel although it doesn't consume too much memory.

tomdee · 2017-10-31T16:31:54Z

I think this would be a great PR to merge but with the current values I'm worried that it will create (additional) hard to diagnose problems since the limits might be too low.

Dieken · 2017-11-01T06:08:43Z

Maybe 50m cpu and 100Mi memory is safer, I'm not sure about the best numbers, need some benchmark and survey.

tomdee · 2017-11-24T03:38:40Z

OK, my testing has shown that a 1000 node vxlan cluster (unfortunately with etcd backend) uses ~22MB. Using flannel with the k8s subnet manager adds about 10MB to the memory usage so I think a limit of 100m CPU and 50Mi memory would be OK. If you make this change to all the yamls in this repo I can merge it.

@tomdee

…_adj Usually flanneld consumes 5m CPU and 15Mi memory according to "kubectl top". According to test by @tomdee at flannel-io#855 (comment), 50Mi is enough for a 1000 node vxlan cluster.

Dieken · 2017-12-01T09:46:36Z

@tomdee Thank you very much for your testing, I just updated my PR and changed all the yamls in Documentation/.

Dieken · 2017-12-01T09:54:27Z

In commit 014b2d5, @osoriano changed this line in Documentation/kube-flannel.yml:

@@ -100,7 +110,7 @@ spec:
         args:
         - -f
         - /etc/kube-flannel/cni-conf.json
-        - /etc/cni/net.d/10-flannel.conf
+        - /etc/cni/net.d/10-flannel.conflist
         volumeMounts:
         - name: cni
           mountPath: /etc/cni/net.d

Is this a mistake?

osoriano · 2017-12-01T19:14:23Z

@Dieken 10-flannel.conflist should be read by CNI here
https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/network/cni/cni.go#L99
https://github.com/containernetworking/cni/blob/master/libcni/conf.go#L177

Is there some error you are seeing?

Dieken · 2017-12-02T01:18:28Z

@osoriano Sorry I didn’t know the difference between .conf and .conflist, I thought it was a typo, hadn’t upgraded to the latest kube-flannel.yml.

BTW, is that fix required to other yaml files in Documentation/ ? And maybe the initContainer need delete 10-flannel.conf for a smooth upgrade.

Dieken · 2017-12-06T02:41:20Z

@tomdee could you review the PR and merge it? The discussion in the last three comments are irrelated.

osoriano · 2017-12-06T03:22:44Z

@Dieken, thanks for pointing that out. The .conflist files are parsed differently since they contain a list of plugins for plugin chaining.

If both 10-flannel.conf and 10-flannel.conflist are both present, only 10-flannel.conf will be used (first file in the sorted list). You're right, the old file should be removed in an upgrade. Although, it won't break the upgrade either.

I only tested the change with kube-flannel.yml, but it seems like others could be updated as well.

squeed · 2017-12-06T09:26:51Z

@osoriano not exactly - Kubernetes prefers conflists over confs, regardless of the sorting.

osoriano · 2017-12-06T15:03:57Z

@squeed Can you help me understand where this happens in the code?

I'm not that familiar with CNI, but saw that the conf files are first sorted, then we return on the first valid conf.

Also, from the documentation:
If there are multiple CNI configuration files in the directory, the first one in lexicographic order of file name is used [1]

[1] https://kubernetes.io/docs/concepts/cluster-administration/network-plugins/

squeed · 2017-12-07T10:32:23Z

@osoriano argh, you're completely right.. And I even wrote that code. I was thinking of a different CNI loading shim I wrote for rkt :-).

tomdee added the reviewed/needs more information label Oct 31, 2017

tomdee removed the reviewed/needs more information label Nov 24, 2017

make sure flanneld got QoS class "Guaranteed" to have lower oom_score…

77c8e12

…_adj Usually flanneld consumes 5m CPU and 15Mi memory according to "kubectl top". According to test by @tomdee at flannel-io#855 (comment), 50Mi is enough for a 1000 node vxlan cluster.

Dieken force-pushed the master branch from dce9835 to 77c8e12 Compare December 1, 2017 09:45

tomdee merged commit c4b085b into flannel-io:master Dec 7, 2017

tomdee mentioned this pull request Jan 22, 2018

Add resources requests and limits to kube-flannel.yml v0.9.1 #918

Closed

ttarczynski mentioned this pull request Oct 22, 2018

Flannel POD OOMKilled #963

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

make sure flanneld got QoS class "Guaranteed" to have lower oom_score_adj #855

make sure flanneld got QoS class "Guaranteed" to have lower oom_score_adj #855

Dieken commented Oct 26, 2017

tomdee commented Oct 28, 2017

Dieken commented Oct 30, 2017 •

edited

Loading

tomdee commented Oct 31, 2017

Dieken commented Nov 1, 2017

tomdee commented Nov 24, 2017 •

edited

Loading

Dieken commented Dec 1, 2017

Dieken commented Dec 1, 2017

osoriano commented Dec 1, 2017 •

edited

Loading

Dieken commented Dec 2, 2017 •

edited

Loading

Dieken commented Dec 6, 2017

osoriano commented Dec 6, 2017

squeed commented Dec 6, 2017

osoriano commented Dec 6, 2017 •

edited

Loading

squeed commented Dec 7, 2017

make sure flanneld got QoS class "Guaranteed" to have lower oom_score_adj #855

make sure flanneld got QoS class "Guaranteed" to have lower oom_score_adj #855

Conversation

Dieken commented Oct 26, 2017

Description

tomdee commented Oct 28, 2017

Dieken commented Oct 30, 2017 • edited Loading

tomdee commented Oct 31, 2017

Dieken commented Nov 1, 2017

tomdee commented Nov 24, 2017 • edited Loading

Dieken commented Dec 1, 2017

Dieken commented Dec 1, 2017

osoriano commented Dec 1, 2017 • edited Loading

Dieken commented Dec 2, 2017 • edited Loading

Dieken commented Dec 6, 2017

osoriano commented Dec 6, 2017

squeed commented Dec 6, 2017

osoriano commented Dec 6, 2017 • edited Loading

squeed commented Dec 7, 2017

Dieken commented Oct 30, 2017 •

edited

Loading

tomdee commented Nov 24, 2017 •

edited

Loading

osoriano commented Dec 1, 2017 •

edited

Loading

Dieken commented Dec 2, 2017 •

edited

Loading

osoriano commented Dec 6, 2017 •

edited

Loading