Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need clarification: KEP stages and transitions #2960

Closed
thockin opened this issue Sep 8, 2021 · 16 comments
Closed

Need clarification: KEP stages and transitions #2960

thockin opened this issue Sep 8, 2021 · 16 comments
Assignees
Labels
area/enhancements Issues or PRs related to the Enhancements subproject sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. sig/release Categorizes an issue or PR as relevant to SIG Release.
Milestone

Comments

@thockin
Copy link
Member

thockin commented Sep 8, 2021

From slack (summarized):

Is the KEP freeze "you must have all KEP changes merged" or "You must have all intended-to-change KEPs in the sheet"?

For example: I have some KEPs which we WANT to move from Alpha->Beta or Beta->GA, but the code to do those things is not done yet and MIGHT slip. I don't want to change KEP stage text until we know whether it is in or not, and that won't be until well after the enhancement freeze.

That's not an exception - that's the norm, I think.

I think the only sane interpretation is that the KEP deadline indicates which KEPs are being tracked, but every KEP should be considered "at risk" until stage changes, and when stage changes, that should be authoritative. What I DON'T want is to have to remember to revert changes to a KEP if the code misses the release. That will fail and leave ambiguous results.

Do we need a 2-phase commit - a distinction between target and actual?

We need clarity on the enhancement freeze - what EXACTLY is the deadline for? What sorts of KEP changes are allowed after the enhancement freeze?

Rey said "Enhancements Freeze indicates which KEPs are being tracked ... What the release team looks for in Enhancements Freeze is for the KEP to be marked as "implementable", has the correct stage for the release, ... and an approved PRR"

This says to me that if I want to move to GA in 1.23, I update the KEP to say GA for stage. If the code doesn't make it, we have a KEP that is ambiguous (or flat out lying). Expecting people to come back and revert the stage is unrealistic.

IMO - stage should be a trailing indicator. Updating stage should be on the success path, not on the failure path.

Some SIGs/reviewers are resetting the status to implemented/implementable for each stage. I don't think that was the intention.

Between stage, milestone, and status there's overlap and ambiguity. For example, let's look at a new KEP.

It starts as:

stage: alpha
status: provisional
alpha: 
beta: 
ga: 

We iterate on that and it ultimately gets approved:

stage: alpha
status: implementable
alpha: v1.X
beta: v1.Y
ga: v1.Z

Development work happens and we release v1.X. Given the above fields, I can assume that this KEP is alpha in v1.X. But if, for some reason, the code missed the boat, someone has to go back and edit the KEP to change alpha, beta, and ga. That is prone to failure, and if it is forgotten, then people might take the wrong result.

I think we should instead do soemthing like:

stage: prealpha
status: provisional
alpha: 
beta: 
ga: 

which gets approved to:

stage: prealpha
status: implementable
alpha: v1.X
beta: v1.Y
ga: v1.Z

When we release v1.X, we can update this to:

stage: alpha
status: implementable
alpha: v1.X
beta: v1.Y
ga: v1.Z

This means that someone has to touch the KEP - the same as before, but the failure mode is better. People won't accidentally assume it is ready to use. It would stay in this state until all the code for beta is merged, at which point it becomes:

stage: beta
status: implementable
alpha: v1.X
beta: v1.Y
ga: v1.Z

Obviously, we could model this other ways. E.g. we could have a per-stage status:

status:
  alpha: implemented
  beta: implementable
@k8s-ci-robot k8s-ci-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Sep 8, 2021
@thockin
Copy link
Member Author

thockin commented Sep 8, 2021

/sig release

@k8s-ci-robot k8s-ci-robot added sig/release Categorizes an issue or PR as relevant to SIG Release. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Sep 8, 2021
@justaugustus
Copy link
Member

cc: @kubernetes/sig-release @kubernetes/enhancements
/area enhancements

@k8s-ci-robot k8s-ci-robot added the area/enhancements Issues or PRs related to the Enhancements subproject label Sep 8, 2021
@justaugustus justaugustus added this to the keps-beta milestone Sep 8, 2021
@johnbelamaric
Copy link
Member

thanks Tim. Here's another one where the CI makes too many assumptions about the various fields being set correctly at the correct times:

#563 (comment)

We had intended to have kepctl manage this metadata in a consistent way, but that has stalled. We need a better solution in the interim.

@thockin
Copy link
Member Author

thockin commented Sep 8, 2021

If we can define the state-machine, we can code it up as CI. "Error: can't transition directly from state FOO to state BAR".

@wojtek-t
Copy link
Member

wojtek-t commented Sep 9, 2021

Copying my comment from the slack channel:

I think that what we're missing is basically "two-phase-commit"
so in the ideal world the way I see that is:
- you update the KEP for feature freeze with your intention
- then you do the actual implementation
- once  all the implementation (including metrics, tests, etc.), you mark the KEP as "actual intent"
- and only then you are allow to send a PR changing the value of a feature-gate

that would gives us ways to ensure that the code is actually graduated only when it should be...
and on third step you're allow to update the KEP if that is needed (e.g. something that wasn't predicted was discovered)

Which means that the state machine I'm proposing is (well state-machine seems like a too big word for that, as transitions can go only between adjacent states):

provisional -> intended for Alpha -> code-complete for Alpha [-> graduated Alpha ??] -> intended for Beta -> code-complete for Beta [-> graduated beta ??] -> intended for Stable -> code-complete for Stable -> implemented

In the ideal world how I would see that is:

  • we have feature-freeze as it is now [for putting keps into "intended for X" state]
  • the there is code-freeze [but PRs touching feature-gates file are not allowed then]
  • then it's graduation-freeze [where you mark your KEP as code-complete for X, approvers check if everything that should happen really happened, not just code, but also tests, metrics, ...]
  • based on that you're only allowed to bump your feature-gate

I admit that adds more burden on the approvers, but given that code-approvers are not necessary the same people are KEP approvers, in the current maturity state of the project I think it would actually be worth it.

@kikisdeliveryservice
Copy link
Member

kikisdeliveryservice commented Sep 22, 2021

Q: this is a speculative issue? Because when I was on the release team, the team did go through the KEPs post code freeze to ensure all PRs intended to merge did merge and if not asked the authors to update the KEPs.

While I think there is more granular control we could have on the KEPs, I worry that the reality is that forcing people to update KEPs multiple times within a release post freeze would create a new set of missed updates for authors who previously had no problems creating more issues with up to date KEPs than currently exist.

Thinking of something that is lower friction but provides the verification that @thockin rightly wants about the state: Could we ask sigs to somehow verify the state of the KEPs post code-freeze? I'd be more inclined to do this via the release team for 1 release, then see how it works out, get feedback and then memorialize it in the actual KEP template.

Further, I think a good point is made, that until the release is released, all the KEP statuses are tentative. I think that is something that we should/could more clearly communicate to outside observers, even outside of the KEP itself - perhaps via docs in the enhancements repo and within the KEP website that is being worked on something like: "all KEPs targeted at the current in-progress release: $RELEASENUMBER are only tentative. please wait until the release is cut on $DATE for the final list and status of KEPs in this release" or some such thing...

@kikisdeliveryservice
Copy link
Member

As a final note "implemented" is generally used for KEPs that are completely finished post-GA. I think the goal should be that at the end point of the release only the verifiably "finished" KEPs (for their stage) have the release and graduation in their kepy.yaml. IMO we should be removing/updating inaccurate KEPs as opposed to adding a hurdle for accurate KEPs.

I'd like to give this a bit more thought, but thanks for raising this

@kikisdeliveryservice kikisdeliveryservice self-assigned this Sep 22, 2021
@kikisdeliveryservice kikisdeliveryservice added the sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. label Sep 22, 2021
@wojtek-t
Copy link
Member

Q: this is a speculative issue? Because when I was on the release team, the team did go through the KEPs post code freeze to ensure all PRs intended to merge did merge and if not asked the authors to update the KEPs.

This isn't a speculative question - this is a real thing that happened.
There are multiple aspects of this:

  • you were going through PRs that linked the enhancement issue, right? There is no guarantee that all indeed were linked
  • you were looking into PRs that were expected; the issue that I've found was also the opposite - things that shouldn't merge were merged - see [WIP] Promote KEP-1672 to GA #2938 (comment)

@spiffxp
Copy link
Member

spiffxp commented Sep 23, 2021

FWIW, the logic itself is pretty easy to change, the tangly bits are in the data, since it's been mostly hand edited YAML that has historically not been validated against much.

So for example, if we decide to change required = ! (stage != implementable ) to required = stage == implementable || stage == implemented then we need to update... *checks notes* about 71 KEPs to have a PRR file.

We could use latest-milestone to avoid checking this for KEPs that are earlier than v1.21, but this is yet another field that's not common and easily falls out of date. Tracking down what the value should actually be and/or requiring people to update it is toilsome. Stuff like this starts to make me think this issue is a subset of #2988

@spiffxp
Copy link
Member

spiffxp commented Oct 8, 2021

FWIW, the following KEPs lack a latest-milestone field, which is the blocker to switching the PRR check to trigger on stage == Implemented:

  • keps/provider-aws/2313-aws-k8s-tester/kep.yaml
  • keps/provider-aws/2314-custom-endpoints-support-for-aws-cloud-provider/kep.yaml
  • keps/provider-aws/2315-aws-loadbalancer-prefix/kep.yaml
  • keps/provider-aws/629-alb-ingress-controller/kep.yaml
  • keps/provider-aws/630-ebs-csi-driver/kep.yaml
  • keps/sig-api-machinery/1101-immutable-fields/kep.yaml
  • keps/sig-api-machinery/1152-less-object-serializations/kep.yaml
  • keps/sig-api-machinery/1623-standardize-conditions/kep.yaml
  • keps/sig-api-machinery/1872-manifest-based-admission-webhooks/kep.yaml
  • keps/sig-api-machinery/2332-pruning-for-custom-resources/kep.yaml
  • keps/sig-api-machinery/2333-legacyflags-kflag/kep.yaml
  • keps/sig-api-machinery/2335-vanilla-crd-openapi-subset-structural-schemas/kep.yaml
  • keps/sig-api-machinery/2337-k8s.io-group-protection/kep.yaml
  • keps/sig-api-machinery/2340-Consistent-reads-from-cache/kep.yaml
  • keps/sig-api-machinery/2341-enabling-clients-to-tell-if-resource-endpoints-serve-the-same-set-of-objects/kep.yaml
  • keps/sig-api-machinery/2342-exposing-hashed-storage-versions-via-the-discovery-API/kep.yaml
  • keps/sig-api-machinery/2343-automated-storage-version-migration-with-storage-version-hash/kep.yaml
  • keps/sig-api-machinery/492-admission-webhooks/kep.yaml
  • keps/sig-api-machinery/575-crd-defaulting/kep.yaml
  • keps/sig-api-machinery/576-dry-run/kep.yaml
  • keps/sig-api-machinery/598-crd-conversion-webhook/kep.yaml
  • keps/sig-api-machinery/956-watch-bookmark/kep.yaml
  • keps/sig-apps/2360-optional-service-environment-variables/kep.yaml
  • keps/sig-apps/706-portable-service-definitions/kep.yaml
  • keps/sig-architecture/0000-kep-process/kep.yaml
  • keps/sig-architecture/1194-prod-readiness/kep.yaml
  • keps/sig-architecture/1618-conformance-profiles/kep.yaml
  • keps/sig-architecture/1659-standard-topology-labels/kep.yaml
  • keps/sig-architecture/617-improve-kep-implementation/kep.yaml
  • keps/sig-architecture/960-conformance-behaviors/kep.yaml
  • keps/sig-auth/1513-certificate-signing-request/kep.yaml
  • keps/sig-auth/1687-hierarchical-namespaces-subproject/kep.yaml
  • keps/sig-auth/279-limit-node-access/kep.yaml
  • keps/sig-auth/600-dynamic-audit-configuration/kep.yaml
  • keps/sig-auth/789-harden-default-discover-bindings/kep.yaml
  • keps/sig-autoscaling/117-hpa-metrics-specificity/kep.yaml
  • keps/sig-autoscaling/853-configurable-hpa-scale-velocity/kep.yaml
  • keps/sig-cli/2229-kubectl-xdg-base-dir/kep.yaml
  • keps/sig-cli/2377-Kustomize/kep.yaml
  • keps/sig-cli/2379-kubectl-plugins/kep.yaml
  • keps/sig-cli/2380-data-driven-commands-for-kubectl/kep.yaml
  • keps/sig-cli/2381-future-of-kubectl-cp/kep.yaml
  • keps/sig-cli/2384-kustomize-file-processing-integration/kep.yaml
  • keps/sig-cli/2386-kustomize-subcommand-integration/kep.yaml
  • keps/sig-cli/491-kubectl-diff/kep.yaml
  • keps/sig-cloud-provider/1179-building-without-in-tree-providers/kep.yaml
  • keps/sig-cloud-provider/2133-out-of-tree-credential-provider/kep.yaml
  • keps/sig-cloud-provider/2392-cloud-controller-manager/kep.yaml
  • keps/sig-cloud-provider/2394-support-out-of-tree-AWS-cloud-provider/kep.yaml
  • keps/sig-cloud-provider/668-out-of-tree-gce/kep.yaml
  • keps/sig-cloud-provider/669-out-of-tree-openstack/kep.yaml
  • keps/sig-cloud-provider/670-out-of-tree-vsphere/kep.yaml
  • keps/sig-cloud-provider/671-out-of-tree-ibm/kep.yaml
  • keps/sig-cloud-provider/azure/2328-ccm-instance-metadata/kep.yaml
  • keps/sig-cloud-provider/providers/0000-cloud-provider-template/kep.yaml
  • keps/sig-cloud-provider/providers/2530-alibaba-cloud/kep.yaml
  • keps/sig-cluster-lifecycle/addons/2492-Addons-via-Operators/kep.yaml
  • keps/sig-cluster-lifecycle/addons/2494-Manifest-Bundle/kep.yaml
  • keps/sig-cluster-lifecycle/clusterapi/2495-Kubernetes-Cluster-Management-API/kep.yaml
  • keps/sig-cluster-lifecycle/etcdadm/2496-etcdadm/kep.yaml
  • keps/sig-cluster-lifecycle/image-builder/2497-Kubernetes-Image-Builder/kep.yaml
  • keps/sig-cluster-lifecycle/kubeadm/1177-kubeadm-with-kustomize/kep.yaml
  • keps/sig-cluster-lifecycle/kubeadm/2498-Kubeadm-Config-versioning/kep.yaml
  • keps/sig-cluster-lifecycle/kubeadm/2500-kubeadm-join-control-plane-workflow/kep.yaml
  • keps/sig-cluster-lifecycle/kubeadm/2501-kubeadm-phases-to-beta/kep.yaml
  • keps/sig-cluster-lifecycle/kubeadm/2505-Kubeadm-operator/kep.yaml
  • keps/sig-cluster-lifecycle/kubeadm/378-bootstrap-checkpointing/kep.yaml
  • keps/sig-contributor-experience/0000-community-forum/kep.yaml
  • keps/sig-contributor-experience/1553-issue-triage/kep.yaml
  • keps/sig-contributor-experience/2225-contributor-site/kep.yaml
  • keps/sig-docs/1326-third-party-content-in-docs/kep.yaml
  • keps/sig-instrumentation/1013-metrics-watch-api/kep.yaml
  • keps/sig-instrumentation/1206-metrics-overhaul/kep.yaml
  • keps/sig-network/1024-nodelocal-cache-dns/kep.yaml
  • keps/sig-network/2447-Make-kube-proxy-service-abstraction-optional/kep.yaml
  • keps/sig-network/2448-Remove-kube-proxy-automatic-clean-up-logic/kep.yaml
  • keps/sig-network/2449-move-externalDNS-out-of-kubernetes-incubator/kep.yaml
  • keps/sig-network/2450-Remove-knowledge-of-pod-cluster-CIDR-from-iptables-rules/kep.yaml
  • keps/sig-network/265-ipvs-based-load-balancing/kep.yaml
  • keps/sig-network/2829-gateway-api-to-k8s-io/kep.yaml
  • keps/sig-network/427-coredns/kep.yaml
  • keps/sig-network/566-coredns-default/kep.yaml
  • keps/sig-network/580-pod-readiness-gates/kep.yaml
  • keps/sig-network/980-load-balancer-finalizers/kep.yaml
  • keps/sig-node/135-seccomp/kep.yaml
  • keps/sig-node/1547-building-kubelet-without-docker/kep.yaml
  • keps/sig-node/166-taint-based-eviction/kep.yaml
  • keps/sig-node/495-pod-pid-namespace/kep.yaml
  • keps/sig-node/589-efficient-node-heartbeats/kep.yaml
  • keps/sig-node/688-pod-overhead/kep.yaml
  • keps/sig-node/753-sidecar-containers/kep.yaml
  • keps/sig-node/757-pid-limiting/kep.yaml
  • keps/sig-node/950-liveness-probe-holdoff/kep.yaml
  • keps/sig-release/1731-publishing-packages/kep.yaml
  • keps/sig-release/1732-artifact-management/kep.yaml
  • keps/sig-release/1733-release-notes/kep.yaml
  • keps/sig-scheduling/2372-node-labels-quota/kep.yaml
  • keps/sig-scheduling/382-taint-node-by-condition/kep.yaml
  • keps/sig-scheduling/548-schedule-daemonset-pods/kep.yaml
  • keps/sig-scheduling/583-coscheduling/kep.yaml
  • keps/sig-scheduling/986-resource-quota-scope-selectors/kep.yaml
  • keps/sig-storage/121-local-persistent-volumes/kep.yaml
  • keps/sig-storage/1979-object-storage-support/kep.yaml
  • keps/sig-storage/2451-service-account-token-volumes/kep.yaml
  • keps/sig-storage/351-raw-block-support/kep.yaml
  • keps/sig-storage/554-maximum-volume-count/kep.yaml
  • keps/sig-storage/557-csi-topology/kep.yaml
  • keps/sig-storage/559-volume-subpath-expansion/kep.yaml
  • keps/sig-storage/565-csi-block-support/kep.yaml
  • keps/sig-storage/603-csi-pod-info/kep.yaml
  • keps/sig-storage/770-csi-skip-attach/kep.yaml
  • keps/sig-storage/989-extend-datasource/kep.yaml
  • keps/sig-windows/1043-windows-security-context/kep.yaml
  • keps/sig-windows/116-windows-node-support/kep.yaml
  • keps/sig-windows/689-windows-gmsa/kep.yaml

Less toilsome logic we could try instead of "latest-milestone < 1.21":

  • "last-updated" < whatever date 1.21 was
  • hardcode this list as an exemption list, and ignore the validation error for any of these KEPs

@spiffxp
Copy link
Member

spiffxp commented Oct 8, 2021

#2672 opted to mark everything as implementable and give a dummy "0.0" milestone, we could also add that milestone to everything currently marked as implemented...

@spiffxp
Copy link
Member

spiffxp commented Oct 8, 2021

I've opened #3007 to add missing metadata and enable the PRR check for implemented KEPs

@justaugustus justaugustus removed their assignment Dec 2, 2021
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 2, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Apr 1, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

@k8s-ci-robot
Copy link
Contributor

@k8s-triage-robot: Closing this issue.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@thockin thockin moved this to meta in KEPs I am tracking Dec 20, 2022
@thockin thockin removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Dec 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/enhancements Issues or PRs related to the Enhancements subproject sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. sig/release Categorizes an issue or PR as relevant to SIG Release.
Projects
None yet
Development

No branches or pull requests

9 participants