-
Notifications
You must be signed in to change notification settings - Fork 6.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
optimize cgroups settings for node reserved #9209
optimize cgroups settings for node reserved #9209
Conversation
Hi @shelmingsong. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Hi @shelmingsong , thank you for submitting this. Before we accept the change, could you please explain how is this an optimisation? The design document is from the 1.7 days (2018) and the kubelet cgroup driver already takes care of setting the hierarchy, adding extra logic to the systemd units seems counter-intuitive and prone to clash with future versions of kubernetes so I would personally like to understand what is the benefit we get by adding this. /hold |
Hi @cristicalin , thank you for your reply. Let me describe the reasons for these changes in detail.
When This can be seen in this official Kubernetes document: [enforcing-node-allocatable](https://kubernetes.io/docs/tasks/administer-cluster/reserve-compute-resources/#enforcing-node-allocatable)
If we do not specify cgroup slice in the Containerd service, then Containerd cgroup is created under systemd.slice by default, which is not as expected of kube-reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @shelmingsong for the explanation, it makes sense to me.
/hold cancel
Please also address the additional comments regarding documenting this feature and support for the other CRIs
@@ -23,6 +23,10 @@ kubelet_kubelet_cgroups_cgroupfs: "/system.slice/kubelet.service" | |||
kubelet_fail_swap_on: true | |||
|
|||
# Reserve this space for kube resources | |||
# Set to true to reserve resources for kube daemons |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add these variables to the sample files as well as they serve a documentation role.
It would also be useful to add some more extensive documentation for these variables in the docs/
folder.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I have added these variables to the sample file.
In addition, I added a document:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sorry, I misunderstood the yellow "Pending" decorator on my comments, so the above comment was just sent (it was actually commented many days ago).
@@ -35,6 +35,9 @@ LimitNOFILE=infinity | |||
# Only systemd 226 and above support this version. | |||
TasksMax=infinity | |||
OOMScoreAdjust=-999 | |||
{% if kube_reserved is defined and kube_reserved|bool %} | |||
Slice={{ kube_reserved_cgroups_for_service_slice }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there an equivalent for cri-o
and docker
(cri-dockerd
) for these options? If not then the documentation should explicitly state that this capability is only available with containerd
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there an equivalent for cri-o and docker (cri-dockerd) for these options?
Yes, I have added these options to each of these container manager configuration files. You can see the changes in the following files:
- https://github.com/kubernetes-sigs/kubespray/pull/9209/files#diff-32032985c3f2d4c78c99cfe13e3db695bf4f766e7718b45889c1b8db527105a3
- https://github.com/kubernetes-sigs/kubespray/pull/9209/files#diff-2ea93b939c50e808ef0af1426d7bbb5ad2b243db9bf5106c4050fe1d110e4c26
- https://github.com/kubernetes-sigs/kubespray/pull/9209/files#diff-510a4a526ce7fbd9000f27bbafc2403ea9bccf5cee16643a51e533ec910b3332
- https://github.com/kubernetes-sigs/kubespray/pull/9209/files#diff-2627b48dffe270eb911768d8b3c9d3fb287ad2c91130280729d4dae1462cf333
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sorry, I misunderstood the yellow "Pending" decorator on my comments, so the above comment was just sent (it was actually commented many days ago).
Hi @shelmingsong , please rebase this PR on latest state of the master branch to be able to pass the CI tests. |
de87322
to
044b631
Compare
Hi @cristicalin , thank you for reminding me. |
Great work @shelmingsong, thank you! /approve |
kubelet_runtime_cgroups: "/systemd/system.slice" | ||
kubelet_kubelet_cgroups: "/systemd/system.slice" | ||
kubelet_runtime_cgroups: "{{ kube_reserved_cgroups }}/{{ container_manager }}.service" | ||
kubelet_kubelet_cgroups: "{{ kube_reserved_cgroups }}/kubelet.service" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HI @shelmingsong , Thank you very much.
After the PR , the default kubelet_kubelet_cgroups, would change from the /systemd/system.slice to the /kube.slice/kubelet.service. And not managed by the systemd.
is the default kubelet config changed after the PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yankay Sorry for taking so long to reply.
Yes, the default kubelet configuration will be changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sorry, I misunderstood the yellow "Pending" decorator on my comments, so the above comment was just sent (it was actually commented many days ago).
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale |
@shelmingsong Wow sorry I guess we missed this one, I'll do my best to take the time and review it, big one 😅 |
> /remove-lifecycle stale
@shelmingsong Wow sorry I guess we missed this one, I'll do my best to take the time and review it, big one 😅
it's okay (*^▽^*)
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
@shelmingsong Thank you for this feature, nice one 🙇
🚀
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: cristicalin, floryut, shelmingsong The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
* optimize cgroups settings for node reserved * fix * set cgroup slice for multi container engine * set cgroup slice for crio * add reserved cgroups variables to sample files * Compatible with cgroup path for different container managers * add cgroups doc * fix markdown
Hello everyone, isn't this changes added some misconfiguration in kubelet.env? |
* optimize cgroups settings for node reserved * fix * set cgroup slice for multi container engine * set cgroup slice for crio * add reserved cgroups variables to sample files * Compatible with cgroup path for different container managers * add cgroups doc * fix markdown
* optimize cgroups settings for node reserved * fix * set cgroup slice for multi container engine * set cgroup slice for crio * add reserved cgroups variables to sample files * Compatible with cgroup path for different container managers * add cgroups doc * fix markdown
* optimize cgroups settings for node reserved * fix * set cgroup slice for multi container engine * set cgroup slice for crio * add reserved cgroups variables to sample files * Compatible with cgroup path for different container managers * add cgroups doc * fix markdown
I think there are two separate functions at play here:
Note from the docs, the kubeReservedCgroup setting is optional. It is not required for reservation; it is for enforcement. You can (and should) set KubeReserved without kubeReservedCgroup.
The behaviour before this MR was that by default, resource reservation was applied (via
I don't think that is true, because the safe standard way to reserve resources for kubernetes daemons should be to define KubeReserved resources, without specifying the optional and unnecessary kubeReservedCgroup setting. After this MR that is no longer possible, and there are no kube-reserved resources by default. I spent many hours trying to figure this out, if I misunderstood or made an incorrect assumption I would be happy for a clarification. Anyway can someone please re-open #9692 to address this? @cristicalin @yankay ? Thanks! |
I have tried #11367 to fix this. |
What type of PR is this?
What this PR does / why we need it:
If cgroups driver is
systemd
andkube_reserved
orsystem_reserved
is setkubeReservedCgroup
orsystemReservedCgroup
in kubelet-config.v1beta1.yaml.j2The cgroups setup and hierarchy refer to this article
After the setup, the cgroups hierarchy is as follows:
Which issue(s) this PR fixes:
Special notes for your reviewer:
Does this PR introduce a user-facing change?: