-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Revert "Graduate Evented PLEG to Beta" #122697
Revert "Graduate Evented PLEG to Beta" #122697
Conversation
This reverts commit d971809.
This issue is currently awaiting triage. If a SIG or subproject determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/cc @pohly @aojea /assign @harche @mrunalp @dchen1107 |
/sig node |
/test pull-kubernetes-e2e-kind-beta-features |
@@ -970,7 +969,7 @@ var defaultKubernetesFeatureGates = map[featuregate.Feature]featuregate.FeatureS | |||
|
|||
DynamicResourceAllocation: {Default: false, PreRelease: featuregate.Alpha}, | |||
|
|||
EventedPLEG: {Default: false, PreRelease: featuregate.Beta}, // off by default, requires CRI Runtime support | |||
EventedPLEG: {Default: false, PreRelease: featuregate.Alpha}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
During the discussion on Slack I was assuming that this must have been a change that went recently into master for 1.30 and some real bug.
However, now that I look at this I'm coming to a slightly different conclusion: as the original comment says, this is not a feature that can be enabled in all environments. That comment got lost when promoting to beta.
I think this is a misuse of the feature gate mechanism. If this cannot be enabled unconditionally, then there should be a separate kubelet (?) command line flag or configuration option for enabling it.
As it stands, this can only be promoted to GA (which implies removing the feature gate eventually!) if all CRI runtimes support this. Is that the goal? I haven't checked the KEP.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for one or other reason, this is clear no beta quality, I just found also that this issue was already reported in alpha an never addresses #121003
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have a precedence for downgrading a feature after it was released?
I agree that it should have remained alpha. I just want to understand whether downgrading really is an option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a misuse of the feature gate mechanism. If this cannot be enabled unconditionally, then there should be a separate kubelet (?) command line flag or configuration option for enabling it.
Evented PLEG requires a CRI runtime that supports generating those events. KEP talks about falling back on Generic PLEG in case the kubelet with Evented PLEG is paired with the runtime that doesn't support generating the required events. So ideally, this feature should have worked unconditionally whether a CRI runtime supports Evented PLEG or not.
This has been handled in the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pacoxu's comment here indicates that PR fixed the issue caused by the Evented PLEG in the beta job, unless I misread it somehow.
After including the changes in that PR, is it failing with for some other reason?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have a precedence for downgrading a feature after it was released?
yes, when a feature was determined not to meet graduation requirements and cause regressions
ConfigMap rendering issue was found in the 1.25.0 release. When ConfigMaps get updated within the API, they do not get rendered to the resulting pod's filesystem by the Kubelet. The feature has been reverted to alpha in the 1.25.1 release.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whatever change we make here should be consistently backported to 1.27/1.28/1.29
since the feature was never enabled by default, changing the state will not impact users who are currently explicitly enabling and using it (they can still do that if they want), but does accurately convey the stability level
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although from the slack conversation it seems if the Evented PLEG feature is completely disabled then the beta job passes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The job passes when it doesn't enable Evented PLEG. It should pass also when it is enabled because it should be safe to enable all beta features.
I'm not sure whether #122124 is a proper fix for this problem. I'll let people more familiar with the feature discuss that in that PR.
Even if that is fixed, is the quality and test coverage of Evented PLEG really good enough to call it beta?
The usual graduation criteria for beta are "e2e tests completed and enabled" and for beta, links are expected.
It looks like for PLEG, using the feature in a "real" cluster wasn't considered and not tested (https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/3386-kubelet-evented-pleg#graduation-criteria only mentions E2E node tests). E2E node testing is not sufficient.
Given that there were undetected problems, I'm 👍 for downgrading to alpha, updating the KEP with a more complete test plan, then upgrading to beta when known issues are fixed and test coverage is more complete.
/cc @aojea
This should have never go to beta without cluster test running and because it also had a critical issue open during 1.26 #121003 that turned out to be the problem why we are reverting now /lgtm |
LGTM label has been added. Git tree hash: c2255c9ae68d7b19d231bb0e1dc15dbeaa6e7f1e
|
/test pull-kubernetes-e2e-gce-cos-alpha-features |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: aojea, mrunalp, pacoxu The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/hold |
(Waiting for signal from pull-kubernetes-e2e-gce-cos-alpha-features) |
At a time of promotion we were debating if that was the right decision as we still need to work on extensive test coverage for the feature - including performance and consistency testing. The hope was that promoting to beta will simplify enabling it on certain environment and will allow to ensure a better coverage sooner. As we have not progressed much recently it may be a good idea to make it alpha again to ensure that the expectations for the feature quality are adequate. So I am supportive of this change. /assign @dchen1107 |
/hold cancel |
Thanks for opening the cherry picks. Please copy the release note from here to those PRs as well, and open a PR to update the state and milestones in the KEP |
Release notes are updated. I opened kubernetes/enhancements#4400 to sync the status. |
…97-upstream-release-1.27 Automated cherry pick of #122697: Revert "Graduate Evented PLEG to Beta"
…97-upstream-release-1.29 Automated cherry pick of #122697: Revert "Graduate Evented PLEG to Beta"
…97-upstream-release-1.28 Automated cherry pick of #122697: Revert "Graduate Evented PLEG to Beta"
This reverts commit d971809.
What type of PR is this?
/kind bug
/kind cleanup
What this PR does / why we need it:
https://prow.k8s.io/view/gs/kubernetes-jenkins/pr-logs/pull/121345/pull-kubernetes-e2e-kind-beta-features/1745324395255042048 first pass of this job with disabling this FG EventedPLEG. See #121345 for more detailed CI status.
Which issue(s) this PR fixes:
revert as there is a known issue #121349 that will make static pod failed to start.
It also makes the https://testgrid.k8s.io/sig-testing-kind#kind-master-beta failing for cluster init.
Special notes for your reviewer:
Does this PR introduce a user-facing change?