-
Notifications
You must be signed in to change notification settings - Fork 14.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Graduate Evented PLEG to Beta #39913
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @harche
(I think that) you should also update https://kubernetes.io/docs/concepts/containers/container-lifecycle-hooks/ to mention this beta feature gate and why you might care about it.
If that's not the right page to update, we should find which is the right page and update that instead. For beta features, even those disabled by default, we nearly document why you might want to turn them on and the work you need to do associated with enabling the feature.
If you think there are tasks to document - eg: migrate to using CRI notifications for Pod events - then please add that task page either in this PR or in a separate PR that also mentions the relevant KEP.
IMO, the right place for such page would https://kubernetes.io/docs/concepts/configuration/ I will add a page describing Evented PLEG and its usage. Thanks. |
That isn't the best home. If you want to document pod lifecycle events as a concept, put that in https://kubernetes.io/docs/concepts/containers/ or https://kubernetes.io/docs/concepts/workloads/pods/ If you want to explain the cluster operator angle, the content should live in either https://kubernetes.io/docs/concepts/cluster-administration/ or https://kubernetes.io/docs/tasks/administer-cluster/ (make the choice depending on whether you are explaining how to perform a task, such as migrating to CRI notifications, or providing general context). |
It's a configuration of the kubelet (and CRI Runtime) that affects how it maintains its internal cache of pod statuses. That's the reason I thought configuration made sense. This is not directly connected to any concepts related to containers or pods, neither is it related to any operator or cluster administration. |
Coming to think of it, maybe https://kubernetes.io/docs/tasks/administer-cluster/ makes sense. |
/sig node |
f9e8e9d
to
f9df2ac
Compare
/hold |
f9df2ac
to
dfd2fdc
Compare
/hold cancel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. This will need some work before we can merge it, I'm afraid. For example, I think we should advice readers to drain the nodes they're reconfiguring, or at least empower them to consider doing that.
What do you think @harche ?
1. Start the `CRI Runtime` with the `Evented PLEG` support. e.g. | ||
1. Containerd version 1.7+ | ||
2. CRI-O v1.26+ and also, start the cri-o daemon with the flag `--enable-pod-events=true` or using a drop in config like, | ||
```toml | ||
[crio.runtime] | ||
enable_pod_events: true | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When we're documenting for different container runtimes, we quite often use tabs to let readers click to what they're using.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated. Thanks.
## Why switch to Evented PLEG? | ||
|
||
* The current `Generic PLEG` incurs non-negligible overhead due to frequent polling of container statuses. | ||
* This overhead is exacerbated by Kubelet's parallelism, limiting its scalability and causing poor performance and reliability problems. | ||
* The goal of `Evented PLEG` is to reduce unnecessary work during inactivity by replacing periodic polling. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should go before the list of prerequisites.
|
||
|
||
{{< include "task-tutorial-prereqs.md" >}} {{< version-check >}} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Do you need to enable any feature gates on the control plane?
- Do you need a specific Kubernetes version?
- Do you need administrator-level access to the node(s) you're managing?
- Do you need a container runtime that has support for a particular CRI feature?
and we should also cover, somewhere in this page:
- do you need to roll this out to all nodes at once, or can you phase it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Do you need to enable any feature gates on the control plane?
No.
- Do you need a specific Kubernetes version?
1.26+
- Do you need administrator-level access to the node(s) you're managing?
Yes
- Do you need a container runtime that has support for a particular CRI feature?
Yes
and we should also cover, somewhere in this page:
- do you need to roll this out to all nodes at once, or can you phase it?
You can phase it. It is only a node specific feature that affects how the given kubelet manages the pod status cache internally.
$ grep EventedPLEG /tmp/kubelet.log | ||
I0314 11:10:13.909915 1105457 feature_gate.go:249] feature gates: &{map[EventedPLEG:true]} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's not a shell script. See https://kubernetes.io/docs/contribute/style/style-guide/#separate-commands-from-output
content/en/docs/tasks/administer-cluster/switch-to-evented-pleg.md
Outdated
Show resolved
Hide resolved
content/en/docs/tasks/administer-cluster/switch-to-evented-pleg.md
Outdated
Show resolved
Hide resolved
* The current `Generic PLEG` incurs non-negligible overhead due to frequent polling of container statuses. | ||
* This overhead is exacerbated by Kubelet's parallelism, limiting its scalability and causing poor performance and reliability problems. | ||
* The goal of `Evented PLEG` is to reduce unnecessary work during inactivity by replacing periodic polling. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should you drain the node first?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes.
dfd2fdc
to
643746c
Compare
yes, it makes sense. |
@sftim Thanks for your valuable feedback. I have updated the PR with those recommendations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some more feedback
@@ -0,0 +1,76 @@ | |||
--- | |||
title: Switching from polling to CRI event-based updates to container status |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
title: Switching from polling to CRI event-based updates to container status | |
title: Switching From Polling to CRI Event-based Updates to Container Status |
{{< feature-state for_k8s_version="v1.26" state="beta" >}} | ||
|
||
<!-- overview --> | ||
This page shows how to use `Evented PLEG` (CRI event-based updates to container status) in Kubelet which attempts to lower the resource consumption by avoiding frequent polling for container statuses. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This page shows how to use `Evented PLEG` (CRI event-based updates to container status) in Kubelet which attempts to lower the resource consumption by avoiding frequent polling for container statuses. | |
This page shows how to migrate nodes to use event based updates for container status. The event-based | |
implementation reduces node resource consumption by the kubelet, compared to the legacy approach | |
that relies on polling. | |
You may know this feature as _evented Pod lifecycle event generator (PLEG)_. That's the name used | |
internally within the Kubernetes project for a key implementation detail. |
?
* This support for this feature is available in Kubernetes {{< skew currentVersion >}}; | ||
Kubernetes v1.26 as a beta feature, disabled by default. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* This support for this feature is available in Kubernetes {{< skew currentVersion >}}; | |
Kubernetes v1.26 as a beta feature, disabled by default. | |
You need to run a version of Kubernetes that provides this feature. | |
Kubernetes {{< skew currentVersion >}} includes beta support for event-based container | |
status updates. The feature is beta and is disabled by default. | |
{{< version-check >}} | |
If you are running a different version of Kubernetes, check the documentation for that release. | |
|
||
* This support for this feature is available in Kubernetes {{< skew currentVersion >}}; | ||
Kubernetes v1.26 as a beta feature, disabled by default. | ||
* Please make sure the node is [drained](https://kubernetes.io/docs/tasks/administer-cluster/safely-drain-node/) before proceeding. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be part of the steps and not a prerequisite.
* Start the `CRI Runtime` with the `Evented PLEG` support. | ||
{{< tabs name="tab_with_code" >}} | ||
{{< tab name="Containerd" codelang="bash" >}} | ||
Version 1.7+ | ||
{{< /tab >}} | ||
{{< tab name="CRI-O" codelang="bash" >}} | ||
v1.26+ and also, start the cri-o daemon with the flag `--enable-pod-events=true` or using a drop in config like, | ||
|
||
[crio.runtime] | ||
enable_pod_events: true | ||
|
||
{{< /tab >}} | ||
{{< /tabs >}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At this point, we should have the reader check that their runtime can support events. They don't have to have already reconfigured it (eg CRI-O and enable_pod_events
).
|
||
|
||
1. Start the Kubelet with the [feature gate](https://kubernetes.io/docs/reference/command-line-tools-reference/feature-gates/) `EventedPLEG` enabled. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could hint that you do this by editing its configuration file, and then you restart the kubelet.
2. Verify that `Evented PLEG` is in use: | ||
|
||
``` | ||
$ grep EventedPLEG /tmp/kubelet.log |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we rely on the log path being /tmp
?
Signed-off-by: Harshal Patil <[email protected]> Co-authored-by: Tim Bannister <[email protected]>
643746c
to
efa8181
Compare
Thanks @sftim again for those suggestions. Considering the significant amount of suggestions provided by I have added you as a Co-author in the commit, hope that is fine by you. |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: tengqm The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/lgtm |
LGTM label has been added. Git tree hash: 04ce26686dd5fdeee2f809aad148abd193f496dd
|
Signed-off-by: Harshal Patil [email protected]
Add EventedPLEG to FeatureGates doc.
Corresponding PR - kubernetes/kubernetes#115967
Enhancement - kubernetes/enhancements#3386