Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make sure flanneld got QoS class "Guaranteed" to have lower oom_score_adj #855

Merged
merged 1 commit into from
Dec 7, 2017

Conversation

Dieken
Copy link
Contributor

@Dieken Dieken commented Oct 26, 2017

Usually flanneld consumes 5m CPU and 15Mi memory according to "kubectl top".

Description

A few sentences describing the overall goals of the pull request's commits.
Please include

  • the type of fix - (e.g. bug fix, new feature, documentation)
  • some details on why this PR should be merged
  • the details of the testing you've done on it (both manual and automated)
  • which components are affected by this PR

Flannel pod gets default QoS class "BestEffort" when it doesn't specify cpu and memory resources,
this makes its oom_score_adj to be 1000 and thus is more likely killed first when free memory isn't enough.

This patch assigns same amount of cpu/mem request and amount, makes Flanneld pod gets QoS class "Guaranteed" and oom_score_adj -998 as described at https://kubernetes.io/docs/tasks/administer-cluster/out-of-resource/#node-oom-behavior.

@tomdee
Copy link
Contributor

tomdee commented Oct 28, 2017

Should the limit be higher? Are these values suitable even for large clusters?

@Dieken
Copy link
Contributor Author

Dieken commented Oct 30, 2017

@tomdee , since flanneld is running per node, and most clusters don't frequently create and destroy PODs, I guess 10m cpu and 50Mi memory are enough for small scale(~10 nodes) and medium scale(~100 nodes) Kubernetes clusters. I believe the administrators of large scale cluster are able to optimize these parameters :-D

Maybe CoreOS has some test clusters to confirm the appropriate resource limits, or survey in some flannel mailing list.

BTW, I did encounter Flanneld being OOM-killed by Linux kernel although it doesn't consume too much memory.

@tomdee
Copy link
Contributor

tomdee commented Oct 31, 2017

I think this would be a great PR to merge but with the current values I'm worried that it will create (additional) hard to diagnose problems since the limits might be too low.

@Dieken
Copy link
Contributor Author

Dieken commented Nov 1, 2017

Maybe 50m cpu and 100Mi memory is safer, I'm not sure about the best numbers, need some benchmark and survey.

@tomdee
Copy link
Contributor

tomdee commented Nov 24, 2017

OK, my testing has shown that a 1000 node vxlan cluster (unfortunately with etcd backend) uses ~22MB. Using flannel with the k8s subnet manager adds about 10MB to the memory usage so I think a limit of 100m CPU and 50Mi memory would be OK. If you make this change to all the yamls in this repo I can merge it.

…_adj

Usually flanneld consumes 5m CPU and 15Mi memory according to "kubectl top".
According to test by @tomdee at flannel-io#855 (comment),
50Mi is enough for a 1000 node vxlan cluster.
@Dieken
Copy link
Contributor Author

Dieken commented Dec 1, 2017

@tomdee Thank you very much for your testing, I just updated my PR and changed all the yamls in Documentation/.

@Dieken
Copy link
Contributor Author

Dieken commented Dec 1, 2017

In commit 014b2d5, @osoriano changed this line in Documentation/kube-flannel.yml:

@@ -100,7 +110,7 @@ spec:
         args:
         - -f
         - /etc/kube-flannel/cni-conf.json
-        - /etc/cni/net.d/10-flannel.conf
+        - /etc/cni/net.d/10-flannel.conflist
         volumeMounts:
         - name: cni
           mountPath: /etc/cni/net.d

Is this a mistake?

@osoriano
Copy link
Contributor

osoriano commented Dec 1, 2017

@Dieken
Copy link
Contributor Author

Dieken commented Dec 2, 2017

@osoriano Sorry I didn’t know the difference between .conf and .conflist, I thought it was a typo, hadn’t upgraded to the latest kube-flannel.yml.

BTW, is that fix required to other yaml files in Documentation/ ? And maybe the initContainer need delete 10-flannel.conf for a smooth upgrade.

@Dieken
Copy link
Contributor Author

Dieken commented Dec 6, 2017

@tomdee could you review the PR and merge it? The discussion in the last three comments are irrelated.

@osoriano
Copy link
Contributor

osoriano commented Dec 6, 2017

@Dieken, thanks for pointing that out. The .conflist files are parsed differently since they contain a list of plugins for plugin chaining.

If both 10-flannel.conf and 10-flannel.conflist are both present, only 10-flannel.conf will be used (first file in the sorted list). You're right, the old file should be removed in an upgrade. Although, it won't break the upgrade either.

I only tested the change with kube-flannel.yml, but it seems like others could be updated as well.

@squeed
Copy link
Contributor

squeed commented Dec 6, 2017

@osoriano not exactly - Kubernetes prefers conflists over confs, regardless of the sorting.

@osoriano
Copy link
Contributor

osoriano commented Dec 6, 2017

@squeed Can you help me understand where this happens in the code?

I'm not that familiar with CNI, but saw that the conf files are first sorted, then we return on the first valid conf.

Also, from the documentation:
If there are multiple CNI configuration files in the directory, the first one in lexicographic order of file name is used [1]

[1] https://kubernetes.io/docs/concepts/cluster-administration/network-plugins/

@squeed
Copy link
Contributor

squeed commented Dec 7, 2017

@osoriano argh, you're completely right.. And I even wrote that code. I was thinking of a different CNI loading shim I wrote for rkt :-).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants