-
Notifications
You must be signed in to change notification settings - Fork 6.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(Re-)Enable resource reservation by default for core components and host container runtime #9692
Comments
To note, my guess is that the "culprit" is actually |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /close not-planned |
@k8s-triage-robot: Closing this issue, marking it as "Not Planned". In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
I agree, this seems to be a serious issue. Before #9209 the default result in /etc/kubernetes/kubelet-config.yaml included
Referring to official docs (capitalization mine for emphasis)
My interpretation is that defining resources for KubeReserved simply reduces the node allocatable resources by a small amount, to ensure that pods don't overwhelm a node. This is a simple scheduling matter: As the doc notes, an optional additional step can be taken, to define a separate cgroup for k8s daemons to run in. But doing this, and actually creating that cgroup, aside from being optional and unnecessary, can be dangerous if it restricts the resource usage of k8s daemons. In other words I think we need the following as default behaviour:
|
That would explain the strange behaviors I ended up seeing indeed. I sort of gave up and massively increased RAM capacity to work around it as I wasn’t able to come up with a sensible solution at the time… But if having a kube system cgroup causes things like etcd to be OOMkilled from going over it, it is indeed more harmful than anything else. |
@Tristan971 What value of |
It appears that I also decided to not gamble on it, as I see the following diff in the commit where I changed it: - kubelet_enforce_node_allocatable: "pods,kube-reserved,system-reserved"
+ kubelet_enforce_node_allocatable: "pods" Additionally at the same time - system_master_memory_reserved: 512Mi
+ system_master_memory_reserved: 4Gi This was more than 1.5 years ago however, so I unfortunately cannot recall whether both were necessary or not or if I was simply trying to get the issue to go away. |
#11367 will fix the issue of the disappearing reservation. |
Thanks for the PR and pushing through with this! 👍 |
What would you like to be added:
Considering Kubespray's "deploy a production ready Kubernetes cluster" slogan, it seems like we should always restrain the cgroup slice of Kubernetes pods to not consume ALL CPU and memory resources of the host.
Why is this needed:
After #9209, which is a neat change (nothing against it, basically), Kubespray's default is now to not reserve any CPU/memory for the host and core kubernetes components anymore. And that is a bit of a downgrade I feel.
Running the upgrade in one of our dev clusters I noticed it in the allocatable resources being equal to the host resources, which is obviously somewhat concerning, at least with regards to memory, as we'd obviously prefer to never see the Linux OOMKiller see any action ever if possible...
I appreciate that this somehow now depends on the usage of systemd with cgroups support, but it still seems quite useful as a default to me?
If not, we should at least document it more prominently in the 2.21 release notes, as reserving 200m of CPU and 512Mi of Memory was the previous default
The text was updated successfully, but these errors were encountered: