From cfe59ef20f9fd4bf2fb25b5ec6b3324a29a4becd Mon Sep 17 00:00:00 2001 From: Stephen Augustus Date: Thu, 21 Jan 2021 18:38:22 -0500 Subject: [PATCH 01/34] sig-release/2572-release-cadence: Initial commit Signed-off-by: Stephen Augustus --- .../2572-release-cadence/README.md | 1196 +++++++++++++++++ .../sig-release/2572-release-cadence/kep.yaml | 36 + 2 files changed, 1232 insertions(+) create mode 100644 keps/sig-release/2572-release-cadence/README.md create mode 100644 keps/sig-release/2572-release-cadence/kep.yaml diff --git a/keps/sig-release/2572-release-cadence/README.md b/keps/sig-release/2572-release-cadence/README.md new file mode 100644 index 00000000000..d1f23221c40 --- /dev/null +++ b/keps/sig-release/2572-release-cadence/README.md @@ -0,0 +1,1196 @@ + + +# KEP-2572: Defining the Kubernetes Release Cadence + + +- [Release Signoff Checklist](#release-signoff-checklist) +- [Summary](#summary) +- [Motivation](#motivation) + - [FIXME - Discussion thread description](#fixme---discussion-thread-description) + - [Goals](#goals) + - [Non-Goals](#non-goals) + - [FIXME - Ideas](#fixme---ideas) +- [Proposal](#proposal) + - [User Stories (Optional)](#user-stories-optional) + - [Story 1](#story-1) + - [Story 2](#story-2) + - [FIXME - Support](#fixme---support) + - [Notes/Constraints/Caveats (Optional)](#notesconstraintscaveats-optional) + - [Risks and Mitigations](#risks-and-mitigations) +- [Design Details](#design-details) + - [Test Plan](#test-plan) + - [Graduation Criteria](#graduation-criteria) + - [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy) + - [Version Skew Strategy](#version-skew-strategy) +- [Implementation History](#implementation-history) +- [Drawbacks](#drawbacks) +- [Alternatives](#alternatives) + - [FIXME - LTS](#fixme---lts) + - [FIXME - Go faster](#fixme---go-faster) + - [FIXME - No](#fixme---no) + - [FIXME - Maintenance releases](#fixme---maintenance-releases) +- [Infrastructure Needed (Optional)](#infrastructure-needed-optional) +- [FIXME - Cleanup](#fixme---cleanup) +- [How do we make a decision?](#how-do-we-make-a-decision) + - [Canonical](#canonical) + - [Alternatives](#alternatives-1) +- [Do we have any data?](#do-we-have-any-data) +- [How do we implement?](#how-do-we-implement) + - [Does that mandate a fixed frequency?](#does-that-mandate-a-fixed-frequency) + - [Releases don’t necessarily have to be equally spaced](#releases-dont-necessarily-have-to-be-equally-spaced) +- [Leads meeting feedback](#leads-meeting-feedback) + - [From Jeremy](#from-jeremy) +- [Comment, without decision](#comment-without-decision) +- [Needs response](#needs-response) +- [Implementation Details](#implementation-details) +- [Explanatory](#explanatory) +- [Data](#data) + + +## Release Signoff Checklist + + + +Items marked with (R) are required *prior to targeting to a milestone / release*. + +- [ ] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR) +- [ ] (R) KEP approvers have approved the KEP status as `implementable` +- [ ] (R) Design details are appropriately documented +- [ ] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors) +- [ ] (R) Graduation criteria is in place +- [ ] (R) Production readiness review completed +- [ ] (R) Production readiness review approved +- [ ] "Implementation History" section is up-to-date for milestone +- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io] +- [ ] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes + + + +[kubernetes.io]: https://kubernetes.io/ +[kubernetes/enhancements]: https://git.k8s.io/enhancements +[kubernetes/kubernetes]: https://git.k8s.io/kubernetes +[kubernetes/website]: https://git.k8s.io/website + +## Summary + + + +## Motivation + + + +### FIXME - Discussion thread description + +What would you like to be added: + +We should formally discuss whether or not it's a good idea to modify the kubernetes/kubernetes release cadence. +Why is this needed: + +The extended release schedule for 1.19 will result in only three minor Kubernetes releases for 2020. + +As a result, we've received several questions across a variety of platforms and DMs about whether the project is intending to only have three minor releases/year. + +In an extremely scientific fashion, I took this question to a Twitter poll to get some initial feedback: https://twitter.com/stephenaugustus/status/1305902993095774210?s=20 + +Of the 709 votes, 59.1% preferred three releases over our current, non-2020 target of four. + +There's quite a bit of feedback to distill from that thread, so let's start aggregating opinions here. + +Strictly my personal opinion: +I'd prefer three releases/year. + + less churn for external consumers + one less quarterly Release Team to recruit for + for a lot of folx, there are usually multiple things happening at the quarterly boundary which a new Kubernetes release can steal focus from + we can collectively use the ~3 months we'd gain for things like triage/roadmap planning, personal downtime, stability efforts + +@kubernetes/sig-release @kubernetes/sig-architecture @kubernetes/sig-testing +/assign +/milestone v1.20 +/priority important-longterm + +### Goals + + + +### Non-Goals + + + +#### FIXME - Ideas + +@sftim: + +> If there were an unsupported-but-tested Kubernetes release cut and published once a week - what would that mean? +> +> I'm imagining something that passes the conformance tests (little point otherwise) but comes with no guarantee. The Rust project has a model a bit like this with a daily unstable release which has nevertheless been through lots of automated testing. +> +> When I'm typing this I'm imagining that I could run minikube start --weekly-unstable and get a local test cluster based on the most recent release. If Kubernetes already had that built and working, would people pick different answers? + +@jberkus: + +> @sftim yeah, you've noticed that the reason, right now, we don't see a lot of community testing on alphas and betas is that we don't make them easy to consume. +> +> I'd say that it would need to go beyond that: we'd need images, minikube, and kubeadm releases for each weekly release. +> +> I don't know how that would affect our choice of major release cadence (isn't it orthagonal?) but it would be a great thing to do regardless. Also very hard. + +## Proposal + + + +### User Stories (Optional) + + + +#### Story 1 + +#### Story 2 + +#### FIXME - Support + +@markyjackson-taulia: + +> I am a +1 for 3 releases/year. As you noted, this will allow us to work on items that can enhance things + +@saschagrunert: + +> My personal opinion is that three releases per year are enough. This means more time can be spent on actual feature development, without narrowing them down just to fit into the release cycle. It will also reduce the management overhead for SIG Release (and the release engineering). +> +> On the other hand it will give less people a chance to participate in the shadowing program for example. Anyways, I think we will find appropriate solutions around those kind of new challenges. + +@mkorbi: + +> I don’t have a strong preference on the one or the other. But from what I +> see from our clients or even from public cloud providers, 4 release seems +> to be a huge struggle. +> Therefore +> +1 4 3 (you get joke in it) + +@timothysc: + +> Huge +1. In a series of user surveys from SIG Cluster lifecycle, we've consistently found that users struggle to keep up with 4 releases a year and have data to show this. +> +> cc @neolit123 + +@neolit123: + +> +1 +> +> Of the 709 votes, 59.1% preferred three releases over our current, non-2020 target of four. +> +> i find these results surprising. we had the same question in the latest SIG CL survey and the most picked answer was "not sure". this tells me that users possibly do not understand all the implications or it does not matter to them. +> +> a couple of benefits that i see from the developer side: +> +> with the yearly support KEP we get 3 releases to maintain +> less e2e test jobs +> +> as long as SIG Arch are on board we should just proceed with the change. +> +> EDIT: to Tim's point +> +> Huge +1. In a series of user surveys from SIG Cluster lifecycle, we've consistently found that users struggle to keep up with 4 releases a year and have data to show this. +> +> this however we did see, users struggle to upgrade. + +@jeremyrickard: + +> less churn for external consumers +> +> puts on external consumer hat This would be super great. We really, really struggle as is. takes off external consumer hat +> +> one less quarterly Release Team to recruit for +> +> puts on release lead hat This I am 100% in agreement on. When we kicked off 1.18 (because of the extra time over the new year and stuff), we had extra time to solicit shadows and as the enhancements lead for 1.18, I was really able to onboard my shadows early and hit the ground running. For 1.20, we didn't have a large amount of time between identifying an EA, 1.19 ending, and 1.20 starting so selecting the shadows and getting things rolling was a challenge, especially for the front loaded work Enhancements has to do. The one downside is that we will remove an opportunity for shadowing, and as we saw this time around we had >100 people apply, and this will remove ~24-ish opportunities. I think we can maybe identify some opportunities for folks that want to be involved though. takes off release lead hat +> +> I think in general, this gives people more time do planning within their SIG, work on KEPS that fall out of that planning, and maybe work toward general health and well being of tests? +> +> Probably a lot to work out if we make that decision, but overall I am pretty strongly in favor of this. + +@yahorse: + +> With everything going on three releases is fine, we should be considerate of the challenges everyone has right now. + +@frapposelli: + +> Big +1 on the 3 releases/year cadence. As the project matures, having less churn becomes a very desirable feature wink + +@ArangoGutierrez: + +> ++ for 3 releases a year + +@Klaven: + +> +1 for 3. + +@hasheddan: + +> +1 for 3 releases + +@ameukam: + +> +1 for 3 releases a year. + +@savitharaghunathan: + +> Huge +1 for three releases a year. From an end user perspective it puts less toil on the teams to keep up with the upgrades and administrative work associated with it. + +@benhemp: + +> +1 +> +> 3 as a consumer team. because almost guaranteed my team will have one quarter where we have to focus elsewhere. Two or one my fear is the machine of getting things to the finish line gets rusty if exercised too infrequently. 16 weeks to ship, extra 4 weeks buffer for holidays for the consuming teams also works out pretty well. + +@wilsonehusin: + +> +1 for 3 releases/year, agree with what many have stated well regarding churn +> +> as someone who started getting involved in this project through shadowing release team, I'd like to echo what @saschagrunert & @jeremyrickard raised above regarding shadow opportunities -- I'm glad we're acknowledging the downside and hope we can keep in mind to present other opportunities for folks to get involved! + +@kcmartin: + +> 3 Releases/year, I am on board with this! +> +> This option seems to open up a lot of opportunity to keep the pace more reasonable, and keep folks from burning out too quickly! +> +> As to the potential for limiting shadow opportunities (mentioned by @jeremyrickard, @wilsonehusin, and others), I'm definitely tuned in to that being a downside, since I've served as a SIG-Release shadow three times, and I think it's a fantastic opportunity! +> +> One possible way to alleviate that downside would be to have 5 shadows, instead of three or four, per sub-team. I believe this is still a manageable number for the Leads, and could distribute the work more evenly. + +@bai: + +> Big +1 on having 3 releases/year. + +@LappleApple: + +> Absolutely +1 for three releases per year, for reasons already stated here. + +@palnabarun: + +> +3000 for 3 releases a year. heart_eyes_cat + +@recollir: + +> +1 for 3 releases. It will help downstream projects to be able to keep up. I mainly think on all “installers” out there, but also all other cluster add-ons. + +@SayakMukhopadhyay: + +> +1 for 3 releases. It will also help some cloud providers playing catch-up to come to parity. + +@leodido: + +> +1 for 3 releases. +> +> Making users able to catch-up is more important than keeping a pace so fast that can lead nowhere (we experienced the same with https://github.com/falcosecurity/falco and we switched to 6 releases per year from 12). + +@vincepri: + +> Big +100 for 3 releases / year cadence. + +@ncdc: + +> +1 to fewer releases / year. + +@fabriziopandini: + +> +1 to 3 releases + +@nader-ziada: + +> +1 on having 3 releases/year + +@oldthreefeng: + +> +1 to 3 releases / years . + +@cpanato: + +> The bot replied for me, but this is myself +> #1290 (comment) +> +> +1 for 3 releases a year + +@jackfrancis: + +> +1 on 3 releases per year +> +> Thanks @justaugustus! + +@akutz: + +> I strongly believe a deterministic and known schedule is more important than the frequency itself. Slowing down to three releases, as @justaugustus said, will provide three additional months for triage and the addressing of existing issues. This should help us to better meet planned release dates as there, in theory, should be fewer unknown-unknowns. So a big +1 from me. + +@pires: + +> +1 on 3 releases a year. +> +> And as noted over Twitter, given someone's concerns on expecting same amount of changes over 25% less releases, I think it's of paramount importance for SIGs to step up and limit the things they include in a release, balancing what matters short/ long-term and kicking out all that can be done outside of the release cycle (we have CRDs, custom API servers, scheduling plug-ins, and so on). Now, I understand it's hard, sometimes even painful, to manage the enthusiasm some like me have on things close to them they want to see gaining traction but the early days are gone and this is now a solid OSS project that requires mature contributors. I'll try and do my part. +> +> On a more personal note, (@jeremyrickard wink, wink) I applied for release shadow believing I'd be picked given my past contributions to the project and my justification to be selected over others. Being rejected was a humbling experience and I'm happy to let you know I didn't lose any of the appetite to contribute. Others may feel differently but, then again, the project is maturing and so should the community. + +@mpbarrett: + +> As more and more complex workloads move to Kube, having less upgrades in a single year is a good thing. Which months (ie April, August, Dec would be every 4 months from the beginning of a year) would be important for me as a user to know. + +@OmerKahani: + +> +1 for 3 +> +> 3 is the maximum upgrades that we can do in my company. Our customers are eCommerce merchants, so from September to December (include), we are in a freeze on all of the infrastructure. +> +> @tpepper for your question - the best month for us will be March, July, November-December. + +@tasdikrahman: + +> +1 on moving to 3 releases, this will definitely help us (end users) in keeping up with the release cadence, by reducing the toil and effort required by us to upgrade versions, if we move from 4 releases to 3. + +@xmudrii: + +> +1 for 3 releases a year. + +@afirth: + +> I guess most end users are blocked by their upstream distro's ability to keep up with the K8s release. For example, GKE rapid channel is currently on 1.18, but 1.19 released in August. Somebody previously mentioned kops has similar issues (also currently on 1.18). I'm curious whether this is because those providers routinely find issues, or because it takes some fixed time to implement the new capabilities and changes. Either way, I don't think this change would impact end user's ability to get new features in a timely fashion much. So, besides the reduced shadow capacity and the complexity of actually making the change, what are the downsides? +> +> +1 for me. + +@Klaven: + +> I see some people attributing drift to longer release cycles (we are only talking about extending them by a month, not 3 months), but I would argue that fast release cycles have caused their own amount of drift, never mind the burden on the release team. +> +> Look at GKE, for example. Versions 1.14 to 1.17 are supported. GKE is arguably one of the best Kubernetes providers and there is a LOT of drift because corporations don't like continuous rapid change and find it hard to support. I also think that as a project matures the rate of change of the increasingly stable and feature-complete core should decrease. At some point the plugins and the out-of-tree projects should be where more change happens. Projects like cluster-api and the like get the attention which used to be focused on maturing the core. +> +> I know that recently there has been a lot of focus on how many releases something needs in order to become GA. I think that honestly is the wrong approach. +> +> I do think it's valid to be concerned that the release of k8s is too much work. I would say this means this system is too laborious. Given that it is this much work, rushing it more would probably only hurt us more. It's obvious that the ecosystem has already felt this strain. If we want to be able to release frequently, we need the release process to become painless. If we don't fix that problem, I don't see any solution other then pushing releases to a manageable cadence. +> +> If we want to release quickly, we need to think not only about the release team, but also the downstream; the adopters. If we want to release quickly and frequently, then we need to focus on making the upgrade process even easier and similar things. +> +> As it is. I think quarterly is too much because of all the above issues. Sadly, I have not gotten as much time contributing to Kubernetes as I would like, so I will butt out of the conversation. I just wanted to put my thoughts down. + +@yboujallab: + +> +1 for 3 releases / year + +@jberkus: + +> @johnbelamaric everything you've said is valid. At the same time, though, my experience has been that the pressure goes the other way: features already linger in alpha or beta for way longer than they ought to. The push to get most features to GA -- or deprecate them -- really seems to be lacking. It's hard to pull stats for this, but most KEP-worthy features seem to take something like 2 years to get there. So from my perspective, more state changes per release would be a good thing (at least, more getting alpha features to beta/GA), even if we didn't change the number of releases per year. +> +> It's hard to tell whether or not switching to 3 releases a year would affect the slow pace of finishing features at all. + +@ehashman: + +> I was on leave for the past two weeks so chiming in a little late... +> +> I am +1 to 3 releases per year. +> +> While some folks in the thread note that this increases the heft/risk of each release, I actually think less releases will reduce risk. I'm speaking from an operations perspective as opposed to a development perspective. +> +> The current Kubernetes release cadence is so high that most organizations cannot keep up with making a major version update every 3 months regularly, or going out of security support in less than a year. While in theory, releasing more frequently reduces the churn and risk of each release, this is only true if end users are actually able to apply the upgrades. +> +> In my experience, this is very challenging and I have not yet seen any organization consistently keep up with the 3 month major upgrade pace for production clusters, especially at a large scale. So, what effectively happens is that end users upgrade less frequently than 3 months, but since that isn't supported, they end up in the situation where they are required to jump multiple major releases at once, which effectively results in much higher risk. +> +> 4 vs. 3 releases is >30% more release work, but I do not believe it provides benefit proportional to that work, nor does a quarterly major release cadence match the vast majority of operation teams' upgrade cycles. + +### Notes/Constraints/Caveats (Optional) + + + +### Risks and Mitigations + + + +## Design Details + + + +### Test Plan + + + +### Graduation Criteria + + + +### Upgrade / Downgrade Strategy + + + +### Version Skew Strategy + + + +## Implementation History + + + +## Drawbacks + + + +## Alternatives + + + +### FIXME - LTS + +@kellycampbell: + +> +1 One thing I'm wondering is how this would affect vendors and other downstream integrators. For example, we build our clusters using kops. It is normally at least one release behind the k8s releases. Would having an extra month help them sync up with the release? I imagine other integrators and cloud providers would also benefit from extra time to update. +> +> Additional point: we really are only able to upgrade our clusters about twice a year for various reasons not related to the k8s or kops release schedule. I see the maturing k8s as similar to OS upgrades such as Ubuntu which releases twice a year and have LTS every 4 releases or so. They are able to patch incrementally and continuously though. If k8s had a similar ability to apply incremental patches in a standard way such that 1.19.1 -> 1.19.2 is more or less automatic and not up to each vendor, that would be amazing. + +@chris-short: + +> I'm in favor of three releases a year. I like @jberkus comment about a Q4 maintenance release too without so much hoopla and fanfare. +> +> I hope folks aren't driven to only fix stuff in Q4, "Oh that's a gnarly one, wait until the end of the year." Is something I could foresee someone thinking at some point, if we don't word things right about a maintenance release. +> +> One question I think has been lightly touched on is, "What about LTS releases?" (and I know this is out of scope but, I don't know where we stand on this atm) + +@youngnick: + +> The consensus on LTS (meaning multi-year support for a single version) is, in short, there's no consensus. We in the LTS WG worked for over two years, and we were able to get everyone to agree to extend the support window to one year (from nine months), which I think speaks to the passion that everyone has about this, the diversity of the use cases Kubernetes is supporting, and the community's determination to get it right. +> +> Speaking personally, I think that LTS is a long way away, if ever - it would require a lot more stability in the core than we have right now. With efforts like all the work to pull things out of tree, and the general movement towards adding new APIs outside of the core API group, I think it's plausible that one day, we may get to a place where we could consider it, but I don't think it's likely for some time, if ever. @tpepper, @jberkus, @LiGgit, and @dims among others may have thoughts here. :) + +@jberkus: + +> @youngnick you summed it up. Having 2 years of support for a specific API snapshot is unrealistic right now for all sorts of reasons, and it wasn't even clear that it was what people actually wanted. + +### FIXME - Go faster + +@sebgoa: + +> Mostly a peanut gallery comment. +> +> The kubernetes releases have been a strong point of the software since its inception. The quality, testing and general care has been amazing and only improved (my point of reference is releases of some apache foundation software). With the increased usage, scrutiny and complexity of the software it feels like each release is a huge effort for the release team so naturally less releases could mean a bit less work. +> +> Users and even cloud providers seem to struggle to keep up with the releases (e.g 1.18 is not yet available on GKE for instance), so this also seems to indicate that less releases would ease the work of users and providers. +> +> But, generally speaking less releases (or less frequent minor releases) will also mean that each release will pack more weight, which means it will need even more testing and it will make upgrades tougher. +> +> With less releases developers will tend to rush their features at the last minute to "get it in" because the next one will be further apart. +> +> IMHO with more releases, developers don't need to rush their features, upgrades a more bite size and it necessarily pushes for even more automation. +> +> So at the risk of being down voted I would argue that we have worked over the last 15 years to agree that "release early, release often" was a good idea, that creating a tight feedback loop with devs, testers and users was a very good idea. +> +> Theoretically we should have processes in place to be able to automatically upgrade and be able to handle even a higher cadence of releases. I could see a future were people don't upgrade that often because there are less releases and then start to fall behind one year, then two...etc. +> +> PS: I understand this is almost a theoretical argument and that each release is a ton of work, I also know I am not helping the release team and I know 2020 is a very tough year. + +@johnbelamaric: + +> I could see a future were people don't upgrade that often because there are less releases and then start to fall behind one year, then two...etc. +> +> Yes, this is a big fear of mine as well. We have worked hard to prevent vendor-based fragmentation (e.g., with conformance) and version-based fragmentation (with API round trip policies, etc). Bigger releases with riskier upgrades may undermine that work. We must avoid a Python2 -> 3 situation. This is also why we elected for a longer support cycle as opposed to an LTS. With the extensive ecosystem we have, fragmentation is extremely dangerous. +> +> I don't think going from 4->3 releases will create this problem, though I do think going to 2 or 1 release would. We need some plan around the mitigations I described earlier though, to ensure we avoid this fate. + +### FIXME - No + +@adrianotto: + +> -1 +> +> I acknowledge this proposed change will not slow the rate of change, but it does concentrate risk. It means that each release would carry more change, and more risk. It also means that adoption of those features will be slower, and that's bad for users. +> +> Release early and release often. This philosophy is a key reason k8s matured as quickly as it did. I accept that 2020 is a strange year, and should be handled as such. That is not a valid reason to change what is done in subsequent years. Each time you make a change like this, it has a range of unintended consequences, such as the risk packing I mentioned above. It would be tragic to slow overall slowdown in the promotion of GA features because they transition based on releases, not duration in use. If the release process is burdensome, we should be asking how we can apply our creativity to make it easier, and reducing the release frequency might be one of several options. But asking the question this way constrains us from looking at the bigger picture, and fully considering what will serve the community best. + +@bowei: + +> Echoing Adrian's comment: +> +> I think releases are a nice forcing function towards stabilization and having less releases will increase drift in the extra time. +> Are we coupling feature(s) stabilization to release cadence too much? +> One fear is that the work simply going to be pushed rather than decrease, but now there are fewer "stabilization" points in the year. + +@spiffxp: + +> I'm a net -1 on 3 releases per year, but I understand I'm in the minority. Reducing the frequency of a risky/painful process does not naturally lead to a net reduction of pain or risk, and usually incentivizes increased risk. "Stabilize the patient" can be a good first step, but is insufficient on its own. +> +> To @tpepper's question of implementation, if we go with 3 symmetric releases, I would suggest using the "extra time" as a tech debt / process debt paydown phase at the beginning of each release cycle. Somewhat like how we left the milestone restriction in place at the beginning of the 1.20 release cycle. This would provide opportunity to pay down tech debt / process debt that involves large refactoring or breaking changes, the sort of work that is actively discouraged during the code freeze leading up to a release. +> +> I may have too narrow a view, but I have concerns that an April / August / December cadence puts undue pressure to land in August. I'm thinking of industries that typically go through a seasonal freeze in Q4. Shifting forward by a month (January / May / September) or two (February, June, October) may relieve some of that pressure, though it does cause one release to straddle Q4/Q1 in an awkward way. +> +> Another option is to declare Q4 what it has been in practice, a quieter time during which we're not actually going to push hard on a release, but I don't think that works as well with 3 releases vs. 4. + +### FIXME - Maintenance releases + +@youngnick: + +> I agree with @spiffxp that whatever we end up doing, we should acknowledge that calendar Q4 is substantially quieter than other quarters, with US Kubecon rolling into US Thanksgiving, rolling into the December festive season. +> +> I think that any plan to change the release cadence needs to take that as a prime consideration, whether it's keeping four releases a year and marking the Q4 one as minimal features, spreading three releases across the year, or some other solution. + +@jberkus: + +> @spiffxp we've talked about making Q4 a "maintenance" release endlessly, but we've never actually implemented that. + +@jayunit100: + +> Sounds like joshs comment is middle ground on the way to three : sure you get 4 releases but the fourth is only bug fixes, tests and stability . + +## Infrastructure Needed (Optional) + + + +## FIXME - Cleanup + +--- + +## How do we make a decision? + +### Canonical + +Write a KEP + +- Seek approval from: + - SIG Release + - SIG Architecture + - SIG Testing + - (maybe) Steering +- Set a lazy consensus timeout + +### Alternatives + +- A survey + - What would this need to contain to be effective? +- Vote to stakeholders (SIG leads + Steering) + - Is there precedence for this outside of elections? + - Should this include subproject owners? + - 1 (Strong disagree) - 5 (Strong agree) + +## Do we have any data? + +Primarily anecdotal from SIG Release members, vendors, and end users. + +AI: + +- (to Elana) What kind of data specifically are we looking for? + - Who's the audience? End users or principals? + - Should we just do this all of the time post-release? +- (to Josh) What did we discover regarding feature trajectory? + +Thoughts: +I would want requesters to be very explicit about the kind of data we're interested in. +SIG Release and others can work on collection, but we need to make sure this isn't a continually moving target. + +We're also starting from a disadvantage trying to compare our status quo to something we haven't tried for a sustained period of time. + +## How do we implement? + +TBD +AI: Expand + +Thoughts: +I feel like less is going to change in the process than people think. + +- Make the decision +- Set the schedule + +### Does that mandate a fixed frequency? + +Thoughts: +Roughly, yes. + +- Release cycle +- Planning / stability phase +- Repeat + +As a consumer, I'd be looking for some predictability in the schedule. + +### Releases don’t necessarily have to be equally spaced + +See point on predictability. + +--- + +# Conversations + +## [Leads meeting](https://docs.google.com/document/d/1Jio9rEtYxlBbntF8mRGmj6Q1JAdzZ9fTDo3ru1HK_LI/edit#bookmark=id.val5alfdahlr) feedback + +- Q: how are we making a decision? +- [comment]: 1.21 is the real EOY release as its scheduler covers december +- Do we have any data? +- [comment]: does sig-arch, sig-release, steering, etc own the final decision? +- [comment]: sep question of how we implement; does that mandate a fixed freq? + - Stick to schedule and just not cut a 4th release? +- [comment]: releases don’t necessarily have to be equally spaced +- [comment]: cadence doesn't feel like things get the attention they deserve, things always feel rushed. Not a lot of space for people to take a step back. +- [comment]: should upgrade testing improve? Be blocking? + +### From Jeremy + +John B: + +> Ask in the meeting, how are we going to make the actual decision for 3 vs 4? +> are we going to vote? or will SIG Release just make the decision? + +Elana: + +> additional ask: can we send out a real survey to end users + +Daniel: + +> the concern about things "taking longer" to go stable because of # of releases in beta also came up again, can we think of a way to handle this? +> Do the three releases need to be evenly spaced? + +Aaron C: + +> Can we get more "implementation" details about how three releases would word? +> Can we make upgrade jobs / tests blocking to make the upgrade between versions better + +## Comment, without decision + +@sftim: + +> I look forward to using the mechanisms already in place (notably CustomResourceDefinition, but also things like scheduling plugins) to enhance the Kubernetes experience outside of the minor release cycle. +> +> A bit more decoupling, now that the investment is made to enable that, sounds good to me - and allows for minor releases of Kubernetes itself to become less frequent. + +## Needs response + +@aojea: + +> 3 releases is cool for development, but not for releasing something with a minimum level of quality. +> We barely keep with the tech debt we have in CI and testing, ie, how many jobs are failing for years that nobody noticed?, how many bugs are open for years? how many features are in alpha,beta for years? +> Each release cycle force people to FIX things if they want to release, the more time to release the more technical debt that you accumulate. +> At least in all my life I never see a project that reducing the release cycle you don't end rushing everything for last week and honestly, I gave up believing that will be real some time. + +## Implementation Details + +@tpepper: + +> Can folks comment on how they'd prefer this to look operationally? +> +> There are operational benefits to consumers in having to do less upgrades (even with the small risk that a longer cadence means each upgrade is bigger), but that's also somewhat depending on when those are presented. Should we maybe rotate the release points backwards or forwards to get away from having a release at a particular time when it's less consumable? We also need to consider how to make this beneficial or at least not super disruptive to community muscle memory. +> +> symmetric four months active dev each, the ~3 months we'd gain for things like triage/roadmap planning, personal downtime, stability efforts spread across that. Are there specific benefits to picking any of: +> releases in April, August, December? +> releases in March, July, November? +> releases in February, June, October? +> releases in January, May, September? +> asymmetry with some explicit downtimes using the ~3 months we'd gain for things like triage/roadmap planning, personal downtime, stability efforts? Eg: late November, December, early January and also August have typically been slower times already. +> quiet December, dev activating more January through April release +> dev May through July release +> August stability scrubbing +> dev September through early December release +> some other explicit stability effort periods like taken in August 2020, formalized into the release cadence as a longer code freeze? +> +> Depending on how we lay out the annual calendar, we might also get some benefit (or complexity) relative to the golang release cycle and our need to update golang twice on average across the patch release lifecycle of each release. This may also compound relative to distributors release cadences, support lifecycles, their stabilization and lead time between content selection and release, and their balancing interoperability across a larger set of dependencies where those are tied to specific months on the annual calendar. +> +> There's not an easy answer here that's going to work right for everybody, but while lots of folks are +1'ing the abstract concept it would be good to capture additional constraints and ideas that are otherwise implicit when they make the +1. + +@neolit123: + +> Golang has a "symmetric" model, so i think k8s should do the same. +> the "symmetric" choice however, would require more discipline and availability from contributors, so my vote here is to try "symmetric" and if it fails (maybe after one year) go "asymmetric". + +@khenidak: + +> @tpepper would be great if we add to this post typical kubecon(s) schedule, since most the community (those who are reviewing, approving, building, releasing changes) is also heavily engaged in these events. + +@vincepri: + +> Trying to summarize general feedback I've gathered today here and there for visibility. This particular issue started with "slowing down", although it quickly became a reflection on why and what can we do about it: +> +> Releases are hard, they need people to commit. +> There is not enough automation (true for probably all our projects and repositories). +> With each Kubernetes release, there is a world of clients that needs to be updated. +> Changes are hard to keep up with, and sometimes important things are buried in release notes. +> Some fear that less releases without policies (read: saying "no" more) isn't enough. +> 2020 +> +> Taking a step back, a few people are suggesting to fix some of these problem from a technical perspective (which is good in itself) and we should prioritize these efforts. From the other side, there is a general sentiment that we need to slow down for the sake of this community's health. +> +> These are both valid, and agreeing on a slower cadence is just the first step; going forward we should normalize taking a few steps back, reflect, and course-correct when things are becoming unsustainable. + +@cblecker: + +> Can folks comment on how they'd prefer this to look operationally? +> +> We should also talk about how this may impact things like code freeze (longer feature freeze with only bug/scale/stability fixes?). +> +> I'm +1 to 3 releases in general, but the details obviously matter. I'd also love to see this in a KEP if we come to consensus on making a change. + +@jberkus: + +> So, some pros/cons not previously mentioned on this thread: +> +> Additional Pros: +> +> Easier to schedule releases because we can fudge dates more to avoid Kubecons and holidays +> Reduced number of E2E, conformance, skew, and upgrade test jobs +> Means that new 1+ year patch support doesn't result in additional patch releases +> We'll get to 1.99 slower so we won't run out of digits +> +> Cons: +> +> Increased pressure by feature authors to get their feature in this release as opposed to waiting. +> Extra month for flakes/failures to get worse if nobody looks at them until Code Freeze +> Extra problems with upstream dependency patch support if our timing is bad +> +> Regarding Tim's question of exact implementation: +> +> My vote is for symmetric releases in April, August, and December. While it's tempting to make December an "off month", development does happen all the time, and if it's not on a release, what is it on? +> +> That would be a reason for my 2nd choice, which would be Symmetric March, July, November, which puts Slow December at the beginning of the cycle instead of the end. However, that's mainly a benefit for working around Kubecon November, and there's no good reason to believe that Kubecon will be happening in November 2 years from now; it might be September or October instead. + +@alculquicondor: + +> Extra month for flakes/failures to get worse if nobody looks at them until Code Freeze +> +> I think risk this can be reduced if we adjust (increase) the code freeze period. + +@jberkus: + +> Extra month for flakes/failures to get worse if nobody looks at them until Code Freeze +> +> I think risk this can be reduced if we adjust (increase) the code freeze period. +> +> Better, how about we keep on top of flakes even if it's not Code Freeze? + +@alculquicondor: + +> Better, how about we keep on top of flakes even if it's not Code Freeze? +> +> Absolutely, but it's not easy to enforce without hurting development of non-violators. + +@jberkus: + +> Shutting down merges is the nuclear option for preventing flakes and fails. We should be able to keep on top of them without resorting to that. But ... we're getting off topic here, unless folks think the "increased time to flake" is a blocker for this (I don't). + +@johnbelamaric: + +> I am tentatively in favor of 3 releases per year, primarily because I believe 4 releases per year is too hard for folks to consume. Even 3 releases per year is probably too much for most, but the downsides of fewer releases make anything less than 3 too risky in my mind. +> +> As I see it, those downsides are some things already mentioned above: +> +> Build up of too much content in the release, and consequent potential for more painful upgrades. +> Very long lead time to get a feature to GA through the alpha/beta/stable phases. +> +> Before making this decision, I think we need mitigations for these. Those mitigations have extensive ripples in how we do our development. +> +> For (1), some mitigations are: +> a) More development out-of-tree / decoupling more components +> b) SIGs saying "no" more +> c) Stricter admission criteria in the release (higher bar from SIG Release, SIG Testing, PRR, WG Reliability, SIG Scalability, etc.) +> +> Of course some of these mitigations might make (2) worse. Other ideas? +> +> For (2), we may want to thinking about how we do our feature graduation process. Having three stages to go through and one less release per year to do it will stretch out how long it takes to add a feature quite substantially. This is a big topic we wouldn't want to gate on, but we may want to have a plan for before moving forward. Some options others have mentioned to me: +> a) Features are in, or out. That is, straight to GA, but with features not admitted to the main build until they are ready. This means we need some alternate build and perhaps dev or feature branches. +> b) Two stages instead of three. I am very skeptical of this, but there is some support for this in how K8s is actually used today. We did an operator survey in the PRR subproject and we found that: +> +> More than 90% of surveyed orgs allow (by policy) all Beta and GA features in prod. +> Less than 10% of surveyed orgs have ever disabled a beta feature in prod. The caveat is that both operators with more than 10,000 nodes under management that answered the survey have done this. +> +> This would indicate that people already treat beat as GA, for the most part. That's not necessarily a good thing, but it is a fact. Of course, the big benefit for us as contributors is that betas can be changed if we have made a mistake. So again we probably wouldn't want to use the current bar for beta and just map it to GA. We would need to raise that quite a bit. +> +> With respect to alpha, the idea there is to gather feedback. I think we get some limited feedback with it, but is it enough? If alpha and beta are not really serving their intended purpose, are they really that useful, as currently defined? +> +> Another option for (2) is breaking up the monolith more and allowing components to release independently. However, this could make test coverage nearly impossible, as individual components would need to be tested in various versions. Given that K8s is operated independently by thousands of organizations, I don't think we can treat the core components as completely independent. Nonetheless there may be some opportunities for decomposition (like we did with CoreDNS, for example). @thockin mentioned kubelet and kube-proxy in this regard. +> +> See also (there are probably many more issues like these): +> +> kubernetes/community#567 +> kubernetes/community#4000 + +@jberkus: + +> For (2), we may want to thinking about how we do our feature graduation process. Having three stages to go through and one less release per year to do it will stretch out how long it takes to add a feature quite substantially. +> +> Does it really, though? How many features actually went from alpha to GA in 9 months? +> +> Does anyone have hard data on this? + +@johnbelamaric: + +> Does it really, though? How many features actually went from alpha to GA in 9 months? +> +> Does anyone have hard data on this? +> +> Ok, that's a fair point to challenge that assumption. I agree probably most don't do it in 9 months, but data would help. I am not sure if "months" is the right measure, though. My concern is that people will still take the same number of releases to make it happen, which means it will take longer. +> +> On a similar note, I am curious if there is any data on the amount of feedback alpha features actually get. + +@jberkus: + +> Yah, and "Is there some outstanding blocker that prevents new features from actually going from alpha to GA in 3 releases for any reason other than maturity?" +> +> That is, if a feature can go from alpha to GA in 3 releases, that's fine. That's a year, and do we really want to make the case that it should take less than a year to get a new feature to GA? BUT ... if something in our process means that features realistically can't be introduced in 1.23 and go to beta in 1.24, then we have a potential problem, because that timeline gets very long if it's gonna actually take you 5 releases. + +@johnbelamaric: + +> The two points I bring up are in tension with each other. That is, the same +features spread across fewer annual releases automatically means more per +release, or longer duration in the pipeline. + +@bowei: + +> We should make sure that there are sufficient improvements/metrics/goals to meet w/ any change (or no change). +> It wouldn't be great if 4 -> 3 didn't improve things and the same rationales would justify 3 -> 2. +> Is there a bar where we can comfortably go back from 3 -> 4? + +@onlydole: + +> +1 for three releases a year, and all of this discussion is fantastic! +> +> I agree with there being three releases a year. However, I do think that having more regular minor version releases would be helpful, so there isn’t any rush to get things into a specific release, nor a blocker around shipping bugfixes or improvements. +> +> I’d like to propose a strongly scoped path for Alpha, Beta, and GA features. I believe that allowing for a bit more leniency for Alpha and Beta code promotion and more stringent requirements for features before they make GA status. + +## Explanatory + +@MIhirMishra: + +> What is the need to decide in advance ? Release when it is ready for its level i.e. if it is ready for beta - release as beta and when ready for GA release as GA. +> More important is what is in the release than how frequently you are releasing. + +@johnbelamaric: + +> What is the need to decide in advance ? Release when it is ready for its level i.e. if it is ready for beta - release as beta and when ready for GA release as GA. +> +> Not sure I understand the question. No one is suggesting deciding in advance - features will be advanced in their stage when they are ready. But the thing is, "when they are ready" depends on the release cadence. In order to go to beta, a feature has to have at least one alpha. In fact, realistically there will be more than one release with it in alpha, since it's really difficult to get meaningful feedback with a single cycle of alpha. Arguably, this becomes a little easier with longer time between releases, but realistically going from alpha release, to availability in downstream products, to real usage, to feedback and updated design and development is pretty hard to squeeze in before code freeze for the next release. +> +> Another way to think about this is that every feature goes through three state transitions: +> +> inception -> alpha +> alpha -> beta +> beta -> GA +> +> Thus, the minimum number of releases to get from inception to GA is three - about 9 months now versus 12 with the proposed schedule. Now, it is the rare feature that would be able to do this in 9 months, because we general need more than one release of alpha, and probably for beta too to get a decent signal on quality. +> +> At the same time, a constant level of effort exerted on K8s would mean that the same number of features could have state transitions in the same amount of time. With fewer releases per year, that means more state transitions per release. +> +> That is, elongating the cycle creates: more content (transitions) per release, and longer time for a given feature to transition through all the states. Higher latency (in terms of time) with more throughput (in terms of releases). +> +> We don't want higher latency with more throughput. Because more throughput means riskier and more difficult upgrades and higher latency means more pain for developers and their customers. +> +> So, what are the mitigations? They amount to reducing the number of transitions per release (to address my (1) above) and reducing the number of transitions per feature (to address my (2) above). +> +> Reducing the transitions per release can be done by: +> +> peeling features off of the monolithic release by pushing them out-of-tree (decomposition) +> SIGs saying "no" more (which pushes things out-of-tree, most likely) +> Requiring a higher bar for a state transition, thus making the effort involved to get to the next stage higher +> +> Reducing the transitions per feature can only be done by changing our policy. In order to do that, we would certainly need to raise the bar higher for state transition - this dovetails with our goal per release. Another possibility is to classify features into low and high risk features. Low risk features could go straight to beta and skip the alpha phase, for example. + +## Data + +@jberkus: + +> All: I'm going to research what actual feature trajectory looks like through Kubernetes, because @johnbelamaric has identified that as a critical question. Stats to come. + +@ehashman: + +> @jberkus Any updates on the stats? :) + +@jberkus: + +> Nope, got bogged down with other issues, and the question of "what is a feature" in Kubernetes turns out to be a hard one to answer. We don't actually track features outside of a single release cycle; we track KEPs, which can either be part of a feature or the parent of several features, but don't match up 1:1 as features. So first I need to invent a way to identify "features" in a way that works for multiple release cycles. + +@jberkus + +> Sorry this has been forever, but answering the question of "how fast do our features advance" turns out to be really hard, because there is literally no object called a "feature" that persists reliably through multiple releases. +> +> To reduce the scope of the problem, I decided to limit this to tracked Enhancements introduced as alpha in 1.12 or 1.13, which were particularly fruitful releases for new features. Limiting it to Tracked kind of limits it to larger features, but I think these are the only ones required to go through alpha/beta/stable anyway (yes/no?). So, in 1.12 and 1.13: +> +> 20 new enhancements were introduced +> 7 did not follow a alpha/beta/stable path, mostly because the were removed or broken up into other features +> 2 are still beta +> 1 advanced in minimum time, that is 1 release alpha, 1 beta, then stable, in 9 months +> 4 advanced from alpha to beta in 1 release, but then took 2 or more releases to go to stable +> 7 advanced more slowly +> +> Our median number of releases for an enhancement to progress is: +> +> Alpha to Beta: 2 releases +> Beta to Stable: 3 releases +> Alpha to Stable: 6 releases +> +> Given this, it does not look like moving to 3 releases a year would slow down feature development due to the alpha/beta/stable progression requirements. +> +> I will note, however, that for many enhancements nagging by the Release Team during each release cycle did provide a goad to resume stalled development. So I'm at least a little concerned that if we make 3/year and don't change how we monitor feature progression at all, it will still slow things down because devs are being nagged less often. + +@ehashman + +> I am a little less worried about the "nag factor" now that we've moved to push-driven enhancements this release (SIG leads track, enhancements team accepts vs. enhancements team tracks). + +@jberkus + +> I'm very worried about it, see new thread. + +@johnbelamaric + +> I am more concerned about the "nag factor" due to the move to push-driven +> development; I think the lack of nagging will slow some features down. +> However, there are new things at play that will help with it - namely, the +> "no more permabeta" where things can't linger in beta forever because their +> API gets turned off automatically. At least for things with APIs. +> +> I strongly believe our alpha-to-stable latency, already quite high, will +> get worse with 3 releases per year. But ultimately, it's up to the feature +> authors. If they want it to go fast enough, they'll have to push for it +> more. Missing a train will have higher cost. +> +> Anyway, I personally am, as I said before, ambivalent on this decision. +> Putting on my GKE hat, it makes my life easier. Putting on my OSS hat, I +> have concerns but nothing that would make me strongly oppose it. It's not a +> change I would push for, but I think it's reasonable to see what happens if +> we try it. +> +> And thank you Josh for the analysis. The median 6 releases goes from 18 +> months to 24 months, which is not great but also something that is not +> forced on feature authors - they could push and get it done in less time if +> they need to. It's a rare feature that was making it in the 9 months +> before, so it would be a rare feature that is forced to have a longer cycle +> than they would have otherwise. + +@jberkus + +> I'm going to start a new thread on nag factors, because these are a bigger deal than I think folks want to address. +> +> There are two areas in the project, currently, that are almost entirely dependent on release team nagging (herafter RTN) for development: +> +> Getting features to GA (and to a lesser degree, to beta) +> Fixing test failures and flakes +> +> With the current 3-month cycle, this means that for 1 month of every cycle RTN doesn't happen, and as a result, these two activities don't happen. This is an extremely broken system. Contributors should not be dependent on nagging of any sort to take care of these project priorities, and the project as a whole shouldn't be depending on the RT for them except during Code Freeze. +> +> A 4-month cycle will make this problem worse, because we'll be looking a 2 months of every cycle where RTN won't happen, a doubling of the amount of time per year for tests to fail and alpha features to be forgotten. +> +> I am not saying that this is a reason NOT to do a 4-month cycle. I am saying that switching to a 4-month cycle makes fixing our broken RTN-based system an urgent priority. Fixing failing tests needs to happen year round. Reviewing features for promotion needs to happen at every SIG meeting. +> +> (FWIW, this is an issue that every large OSS project faces, which is why Code Freeze is to awful in so many projects) diff --git a/keps/sig-release/2572-release-cadence/kep.yaml b/keps/sig-release/2572-release-cadence/kep.yaml new file mode 100644 index 00000000000..8780c3e1a2c --- /dev/null +++ b/keps/sig-release/2572-release-cadence/kep.yaml @@ -0,0 +1,36 @@ +title: Defining the Kubernetes Release Cadence +kep-number: 2572 +authors: + - "@justaugustus" +owning-sig: sig-release +participating-sigs: + - sig-architecture + - sig-testing +status: provisional +creation-date: 2021-01-21 +reviewers: + - "@BenTheElder" + - "@derekwaynecarr" + - "@dims" + - "@hasheddan" + - "@jeremyrickard" + - "@johnbelamaric" + - "@spiffxp" + - "@stevekuznetsov" +approvers: + - "@LappleApple" + - "@saschagrunert" + +# The target maturity stage in the current dev cycle for this KEP. +stage: alpha + +# The most recent milestone for which work toward delivery of this KEP has been +# done. This can be the current (upcoming) milestone, if it is being actively +# worked on. +latest-milestone: "v1.22" + +# The milestone at which this feature was, or is targeted to be, at each stage. +milestone: + alpha: "v1.22" + beta: "v1.23" + stable: "v1.25" From 31da69677f79c9382035996f16025f377fe531ba Mon Sep 17 00:00:00 2001 From: Stephen Augustus Date: Mon, 15 Mar 2021 01:34:30 -0400 Subject: [PATCH 02/34] 2572-release-cadence: Move raw discussion notes under KEP headers Signed-off-by: Stephen Augustus --- .../2572-release-cadence/README.md | 834 +++++++++--------- 1 file changed, 417 insertions(+), 417 deletions(-) diff --git a/keps/sig-release/2572-release-cadence/README.md b/keps/sig-release/2572-release-cadence/README.md index d1f23221c40..aa2b32bd12c 100644 --- a/keps/sig-release/2572-release-cadence/README.md +++ b/keps/sig-release/2572-release-cadence/README.md @@ -48,18 +48,25 @@ SIG Architecture for cross-cutting KEPs). - [Release Signoff Checklist](#release-signoff-checklist) - [Summary](#summary) - [Motivation](#motivation) - - [FIXME - Discussion thread description](#fixme---discussion-thread-description) + - [FIXME](#fixme) + - [Data](#data) - [Goals](#goals) + - [FIXME Does that mandate a fixed frequency?](#fixme-does-that-mandate-a-fixed-frequency) + - [FIXME Releases don’t necessarily have to be equally spaced](#fixme-releases-dont-necessarily-have-to-be-equally-spaced) - [Non-Goals](#non-goals) - - [FIXME - Ideas](#fixme---ideas) + - [FIXME Ideas](#fixme-ideas) + - [FIXME Comment, without decision](#fixme-comment-without-decision) + - [FIXME Needs response](#fixme-needs-response) + - [FIXME Explanatory](#fixme-explanatory) - [Proposal](#proposal) - [User Stories (Optional)](#user-stories-optional) - [Story 1](#story-1) - [Story 2](#story-2) - - [FIXME - Support](#fixme---support) + - [FIXME Support](#fixme-support) - [Notes/Constraints/Caveats (Optional)](#notesconstraintscaveats-optional) - [Risks and Mitigations](#risks-and-mitigations) - [Design Details](#design-details) + - [FIXME Implementation Details](#fixme-implementation-details) - [Test Plan](#test-plan) - [Graduation Criteria](#graduation-criteria) - [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy) @@ -67,26 +74,21 @@ SIG Architecture for cross-cutting KEPs). - [Implementation History](#implementation-history) - [Drawbacks](#drawbacks) - [Alternatives](#alternatives) - - [FIXME - LTS](#fixme---lts) - - [FIXME - Go faster](#fixme---go-faster) - - [FIXME - No](#fixme---no) - - [FIXME - Maintenance releases](#fixme---maintenance-releases) + - [FIXME](#fixme-1) + - [LTS](#lts) + - [Go faster](#go-faster) + - [No](#no) + - [Maintenance releases](#maintenance-releases) - [Infrastructure Needed (Optional)](#infrastructure-needed-optional) -- [FIXME - Cleanup](#fixme---cleanup) -- [How do we make a decision?](#how-do-we-make-a-decision) - - [Canonical](#canonical) - - [Alternatives](#alternatives-1) -- [Do we have any data?](#do-we-have-any-data) -- [How do we implement?](#how-do-we-implement) - - [Does that mandate a fixed frequency?](#does-that-mandate-a-fixed-frequency) - - [Releases don’t necessarily have to be equally spaced](#releases-dont-necessarily-have-to-be-equally-spaced) -- [Leads meeting feedback](#leads-meeting-feedback) - - [From Jeremy](#from-jeremy) -- [Comment, without decision](#comment-without-decision) -- [Needs response](#needs-response) -- [Implementation Details](#implementation-details) -- [Explanatory](#explanatory) -- [Data](#data) +- [FIXME Cleanup](#fixme-cleanup) + - [How do we make a decision?](#how-do-we-make-a-decision) + - [Canonical](#canonical) + - [Alternatives](#alternatives-1) + - [Do we have any data?](#do-we-have-any-data) + - [How do we implement?](#how-do-we-implement) + - [Conversations](#conversations) + - [Leads meeting feedback](#leads-meeting-feedback) + - [From Jeremy](#from-jeremy) ## Release Signoff Checklist @@ -159,7 +161,7 @@ demonstrate the interest in a KEP within the wider Kubernetes community. [experience reports]: https://github.com/golang/go/wiki/ExperienceReports --> -### FIXME - Discussion thread description +### FIXME What would you like to be added: @@ -189,6 +191,94 @@ I'd prefer three releases/year. /milestone v1.20 /priority important-longterm +#### Data + +@jberkus: + +> All: I'm going to research what actual feature trajectory looks like through Kubernetes, because @johnbelamaric has identified that as a critical question. Stats to come. + +@ehashman: + +> @jberkus Any updates on the stats? :) + +@jberkus: + +> Nope, got bogged down with other issues, and the question of "what is a feature" in Kubernetes turns out to be a hard one to answer. We don't actually track features outside of a single release cycle; we track KEPs, which can either be part of a feature or the parent of several features, but don't match up 1:1 as features. So first I need to invent a way to identify "features" in a way that works for multiple release cycles. + +@jberkus + +> Sorry this has been forever, but answering the question of "how fast do our features advance" turns out to be really hard, because there is literally no object called a "feature" that persists reliably through multiple releases. +> +> To reduce the scope of the problem, I decided to limit this to tracked Enhancements introduced as alpha in 1.12 or 1.13, which were particularly fruitful releases for new features. Limiting it to Tracked kind of limits it to larger features, but I think these are the only ones required to go through alpha/beta/stable anyway (yes/no?). So, in 1.12 and 1.13: +> +> 20 new enhancements were introduced +> 7 did not follow a alpha/beta/stable path, mostly because the were removed or broken up into other features +> 2 are still beta +> 1 advanced in minimum time, that is 1 release alpha, 1 beta, then stable, in 9 months +> 4 advanced from alpha to beta in 1 release, but then took 2 or more releases to go to stable +> 7 advanced more slowly +> +> Our median number of releases for an enhancement to progress is: +> +> Alpha to Beta: 2 releases +> Beta to Stable: 3 releases +> Alpha to Stable: 6 releases +> +> Given this, it does not look like moving to 3 releases a year would slow down feature development due to the alpha/beta/stable progression requirements. +> +> I will note, however, that for many enhancements nagging by the Release Team during each release cycle did provide a goad to resume stalled development. So I'm at least a little concerned that if we make 3/year and don't change how we monitor feature progression at all, it will still slow things down because devs are being nagged less often. + +@ehashman + +> I am a little less worried about the "nag factor" now that we've moved to push-driven enhancements this release (SIG leads track, enhancements team accepts vs. enhancements team tracks). + +@jberkus + +> I'm very worried about it, see new thread. + +@johnbelamaric + +> I am more concerned about the "nag factor" due to the move to push-driven +> development; I think the lack of nagging will slow some features down. +> However, there are new things at play that will help with it - namely, the +> "no more permabeta" where things can't linger in beta forever because their +> API gets turned off automatically. At least for things with APIs. +> +> I strongly believe our alpha-to-stable latency, already quite high, will +> get worse with 3 releases per year. But ultimately, it's up to the feature +> authors. If they want it to go fast enough, they'll have to push for it +> more. Missing a train will have higher cost. +> +> Anyway, I personally am, as I said before, ambivalent on this decision. +> Putting on my GKE hat, it makes my life easier. Putting on my OSS hat, I +> have concerns but nothing that would make me strongly oppose it. It's not a +> change I would push for, but I think it's reasonable to see what happens if +> we try it. +> +> And thank you Josh for the analysis. The median 6 releases goes from 18 +> months to 24 months, which is not great but also something that is not +> forced on feature authors - they could push and get it done in less time if +> they need to. It's a rare feature that was making it in the 9 months +> before, so it would be a rare feature that is forced to have a longer cycle +> than they would have otherwise. + +@jberkus + +> I'm going to start a new thread on nag factors, because these are a bigger deal than I think folks want to address. +> +> There are two areas in the project, currently, that are almost entirely dependent on release team nagging (herafter RTN) for development: +> +> Getting features to GA (and to a lesser degree, to beta) +> Fixing test failures and flakes +> +> With the current 3-month cycle, this means that for 1 month of every cycle RTN doesn't happen, and as a result, these two activities don't happen. This is an extremely broken system. Contributors should not be dependent on nagging of any sort to take care of these project priorities, and the project as a whole shouldn't be depending on the RT for them except during Code Freeze. +> +> A 4-month cycle will make this problem worse, because we'll be looking a 2 months of every cycle where RTN won't happen, a doubling of the amount of time per year for tests to fail and alpha features to be forgotten. +> +> I am not saying that this is a reason NOT to do a 4-month cycle. I am saying that switching to a 4-month cycle makes fixing our broken RTN-based system an urgent priority. Fixing failing tests needs to happen year round. Reviewing features for promotion needs to happen at every SIG meeting. +> +> (FWIW, this is an issue that every large OSS project faces, which is why Code Freeze is to awful in so many projects) + ### Goals +#### FIXME Does that mandate a fixed frequency? + +Thoughts: +Roughly, yes. + +- Release cycle +- Planning / stability phase +- Repeat + +As a consumer, I'd be looking for some predictability in the schedule. + +#### FIXME Releases don’t necessarily have to be equally spaced + +See point on predictability. + ### Non-Goals -#### FIXME - Ideas +#### FIXME Ideas @sftim: @@ -221,6 +326,60 @@ and make progress. > > I don't know how that would affect our choice of major release cadence (isn't it orthagonal?) but it would be a great thing to do regardless. Also very hard. +#### FIXME Comment, without decision + +@sftim: + +> I look forward to using the mechanisms already in place (notably CustomResourceDefinition, but also things like scheduling plugins) to enhance the Kubernetes experience outside of the minor release cycle. +> +> A bit more decoupling, now that the investment is made to enable that, sounds good to me - and allows for minor releases of Kubernetes itself to become less frequent. + +#### FIXME Needs response + +@aojea: + +> 3 releases is cool for development, but not for releasing something with a minimum level of quality. +> We barely keep with the tech debt we have in CI and testing, ie, how many jobs are failing for years that nobody noticed?, how many bugs are open for years? how many features are in alpha,beta for years? +> Each release cycle force people to FIX things if they want to release, the more time to release the more technical debt that you accumulate. +> At least in all my life I never see a project that reducing the release cycle you don't end rushing everything for last week and honestly, I gave up believing that will be real some time. + +### FIXME Explanatory + +@MIhirMishra: + +> What is the need to decide in advance ? Release when it is ready for its level i.e. if it is ready for beta - release as beta and when ready for GA release as GA. +> More important is what is in the release than how frequently you are releasing. + +@johnbelamaric: + +> What is the need to decide in advance ? Release when it is ready for its level i.e. if it is ready for beta - release as beta and when ready for GA release as GA. +> +> Not sure I understand the question. No one is suggesting deciding in advance - features will be advanced in their stage when they are ready. But the thing is, "when they are ready" depends on the release cadence. In order to go to beta, a feature has to have at least one alpha. In fact, realistically there will be more than one release with it in alpha, since it's really difficult to get meaningful feedback with a single cycle of alpha. Arguably, this becomes a little easier with longer time between releases, but realistically going from alpha release, to availability in downstream products, to real usage, to feedback and updated design and development is pretty hard to squeeze in before code freeze for the next release. +> +> Another way to think about this is that every feature goes through three state transitions: +> +> inception -> alpha +> alpha -> beta +> beta -> GA +> +> Thus, the minimum number of releases to get from inception to GA is three - about 9 months now versus 12 with the proposed schedule. Now, it is the rare feature that would be able to do this in 9 months, because we general need more than one release of alpha, and probably for beta too to get a decent signal on quality. +> +> At the same time, a constant level of effort exerted on K8s would mean that the same number of features could have state transitions in the same amount of time. With fewer releases per year, that means more state transitions per release. +> +> That is, elongating the cycle creates: more content (transitions) per release, and longer time for a given feature to transition through all the states. Higher latency (in terms of time) with more throughput (in terms of releases). +> +> We don't want higher latency with more throughput. Because more throughput means riskier and more difficult upgrades and higher latency means more pain for developers and their customers. +> +> So, what are the mitigations? They amount to reducing the number of transitions per release (to address my (1) above) and reducing the number of transitions per feature (to address my (2) above). +> +> Reducing the transitions per release can be done by: +> +> peeling features off of the monolithic release by pushing them out-of-tree (decomposition) +> SIGs saying "no" more (which pushes things out-of-tree, most likely) +> Requiring a higher bar for a state transition, thus making the effort involved to get to the next stage higher +> +> Reducing the transitions per feature can only be done by changing our policy. In order to do that, we would certainly need to raise the bar higher for state transition - this dovetails with our goal per release. Another possibility is to classify features into low and high risk features. Low risk features could go straight to beta and skip the alpha phase, for example. + ## Proposal -### Test Plan +### FIXME Implementation Details - +@khenidak: -### Graduation Criteria +> @tpepper would be great if we add to this post typical kubecon(s) schedule, since most the community (those who are reviewing, approving, building, releasing changes) is also heavily engaged in these events. - + +### Graduation Criteria + + -### FIXME - LTS +### FIXME + +#### LTS @kellycampbell: @@ -682,7 +1026,7 @@ information to express the idea and why it was not acceptable. > @youngnick you summed it up. Having 2 years of support for a specific API snapshot is unrealistic right now for all sorts of reasons, and it wasn't even clear that it was what people actually wanted. -### FIXME - Go faster +#### Go faster @sebgoa: @@ -712,7 +1056,7 @@ information to express the idea and why it was not acceptable. > > I don't think going from 4->3 releases will create this problem, though I do think going to 2 or 1 release would. We need some plan around the mitigations I described earlier though, to ensure we avoid this fate. -### FIXME - No +#### No @adrianotto: @@ -740,7 +1084,7 @@ information to express the idea and why it was not acceptable. > > Another option is to declare Q4 what it has been in practice, a quieter time during which we're not actually going to push hard on a release, but I don't think that works as well with 3 releases vs. 4. -### FIXME - Maintenance releases +#### Maintenance releases @youngnick: @@ -764,13 +1108,11 @@ new subproject, repos requested, or GitHub details. Listing these here allows a SIG to get the process for these resources started right away. --> -## FIXME - Cleanup - ---- +## FIXME Cleanup -## How do we make a decision? +### How do we make a decision? -### Canonical +#### Canonical Write a KEP @@ -781,7 +1123,7 @@ Write a KEP - (maybe) Steering - Set a lazy consensus timeout -### Alternatives +#### Alternatives - A survey - What would this need to contain to be effective? @@ -790,7 +1132,7 @@ Write a KEP - Should this include subproject owners? - 1 (Strong disagree) - 5 (Strong agree) -## Do we have any data? +### Do we have any data? Primarily anecdotal from SIG Release members, vendors, and end users. @@ -807,7 +1149,7 @@ SIG Release and others can work on collection, but we need to make sure this isn We're also starting from a disadvantage trying to compare our status quo to something we haven't tried for a sustained period of time. -## How do we implement? +### How do we implement? TBD AI: Expand @@ -818,26 +1160,9 @@ I feel like less is going to change in the process than people think. - Make the decision - Set the schedule -### Does that mandate a fixed frequency? +### Conversations -Thoughts: -Roughly, yes. - -- Release cycle -- Planning / stability phase -- Repeat - -As a consumer, I'd be looking for some predictability in the schedule. - -### Releases don’t necessarily have to be equally spaced - -See point on predictability. - ---- - -# Conversations - -## [Leads meeting](https://docs.google.com/document/d/1Jio9rEtYxlBbntF8mRGmj6Q1JAdzZ9fTDo3ru1HK_LI/edit#bookmark=id.val5alfdahlr) feedback +#### [Leads meeting](https://docs.google.com/document/d/1Jio9rEtYxlBbntF8mRGmj6Q1JAdzZ9fTDo3ru1HK_LI/edit#bookmark=id.val5alfdahlr) feedback - Q: how are we making a decision? - [comment]: 1.21 is the real EOY release as its scheduler covers december @@ -849,7 +1174,7 @@ See point on predictability. - [comment]: cadence doesn't feel like things get the attention they deserve, things always feel rushed. Not a lot of space for people to take a step back. - [comment]: should upgrade testing improve? Be blocking? -### From Jeremy +#### From Jeremy John B: @@ -869,328 +1194,3 @@ Aaron C: > Can we get more "implementation" details about how three releases would word? > Can we make upgrade jobs / tests blocking to make the upgrade between versions better - -## Comment, without decision - -@sftim: - -> I look forward to using the mechanisms already in place (notably CustomResourceDefinition, but also things like scheduling plugins) to enhance the Kubernetes experience outside of the minor release cycle. -> -> A bit more decoupling, now that the investment is made to enable that, sounds good to me - and allows for minor releases of Kubernetes itself to become less frequent. - -## Needs response - -@aojea: - -> 3 releases is cool for development, but not for releasing something with a minimum level of quality. -> We barely keep with the tech debt we have in CI and testing, ie, how many jobs are failing for years that nobody noticed?, how many bugs are open for years? how many features are in alpha,beta for years? -> Each release cycle force people to FIX things if they want to release, the more time to release the more technical debt that you accumulate. -> At least in all my life I never see a project that reducing the release cycle you don't end rushing everything for last week and honestly, I gave up believing that will be real some time. - -## Implementation Details - -@tpepper: - -> Can folks comment on how they'd prefer this to look operationally? -> -> There are operational benefits to consumers in having to do less upgrades (even with the small risk that a longer cadence means each upgrade is bigger), but that's also somewhat depending on when those are presented. Should we maybe rotate the release points backwards or forwards to get away from having a release at a particular time when it's less consumable? We also need to consider how to make this beneficial or at least not super disruptive to community muscle memory. -> -> symmetric four months active dev each, the ~3 months we'd gain for things like triage/roadmap planning, personal downtime, stability efforts spread across that. Are there specific benefits to picking any of: -> releases in April, August, December? -> releases in March, July, November? -> releases in February, June, October? -> releases in January, May, September? -> asymmetry with some explicit downtimes using the ~3 months we'd gain for things like triage/roadmap planning, personal downtime, stability efforts? Eg: late November, December, early January and also August have typically been slower times already. -> quiet December, dev activating more January through April release -> dev May through July release -> August stability scrubbing -> dev September through early December release -> some other explicit stability effort periods like taken in August 2020, formalized into the release cadence as a longer code freeze? -> -> Depending on how we lay out the annual calendar, we might also get some benefit (or complexity) relative to the golang release cycle and our need to update golang twice on average across the patch release lifecycle of each release. This may also compound relative to distributors release cadences, support lifecycles, their stabilization and lead time between content selection and release, and their balancing interoperability across a larger set of dependencies where those are tied to specific months on the annual calendar. -> -> There's not an easy answer here that's going to work right for everybody, but while lots of folks are +1'ing the abstract concept it would be good to capture additional constraints and ideas that are otherwise implicit when they make the +1. - -@neolit123: - -> Golang has a "symmetric" model, so i think k8s should do the same. -> the "symmetric" choice however, would require more discipline and availability from contributors, so my vote here is to try "symmetric" and if it fails (maybe after one year) go "asymmetric". - -@khenidak: - -> @tpepper would be great if we add to this post typical kubecon(s) schedule, since most the community (those who are reviewing, approving, building, releasing changes) is also heavily engaged in these events. - -@vincepri: - -> Trying to summarize general feedback I've gathered today here and there for visibility. This particular issue started with "slowing down", although it quickly became a reflection on why and what can we do about it: -> -> Releases are hard, they need people to commit. -> There is not enough automation (true for probably all our projects and repositories). -> With each Kubernetes release, there is a world of clients that needs to be updated. -> Changes are hard to keep up with, and sometimes important things are buried in release notes. -> Some fear that less releases without policies (read: saying "no" more) isn't enough. -> 2020 -> -> Taking a step back, a few people are suggesting to fix some of these problem from a technical perspective (which is good in itself) and we should prioritize these efforts. From the other side, there is a general sentiment that we need to slow down for the sake of this community's health. -> -> These are both valid, and agreeing on a slower cadence is just the first step; going forward we should normalize taking a few steps back, reflect, and course-correct when things are becoming unsustainable. - -@cblecker: - -> Can folks comment on how they'd prefer this to look operationally? -> -> We should also talk about how this may impact things like code freeze (longer feature freeze with only bug/scale/stability fixes?). -> -> I'm +1 to 3 releases in general, but the details obviously matter. I'd also love to see this in a KEP if we come to consensus on making a change. - -@jberkus: - -> So, some pros/cons not previously mentioned on this thread: -> -> Additional Pros: -> -> Easier to schedule releases because we can fudge dates more to avoid Kubecons and holidays -> Reduced number of E2E, conformance, skew, and upgrade test jobs -> Means that new 1+ year patch support doesn't result in additional patch releases -> We'll get to 1.99 slower so we won't run out of digits -> -> Cons: -> -> Increased pressure by feature authors to get their feature in this release as opposed to waiting. -> Extra month for flakes/failures to get worse if nobody looks at them until Code Freeze -> Extra problems with upstream dependency patch support if our timing is bad -> -> Regarding Tim's question of exact implementation: -> -> My vote is for symmetric releases in April, August, and December. While it's tempting to make December an "off month", development does happen all the time, and if it's not on a release, what is it on? -> -> That would be a reason for my 2nd choice, which would be Symmetric March, July, November, which puts Slow December at the beginning of the cycle instead of the end. However, that's mainly a benefit for working around Kubecon November, and there's no good reason to believe that Kubecon will be happening in November 2 years from now; it might be September or October instead. - -@alculquicondor: - -> Extra month for flakes/failures to get worse if nobody looks at them until Code Freeze -> -> I think risk this can be reduced if we adjust (increase) the code freeze period. - -@jberkus: - -> Extra month for flakes/failures to get worse if nobody looks at them until Code Freeze -> -> I think risk this can be reduced if we adjust (increase) the code freeze period. -> -> Better, how about we keep on top of flakes even if it's not Code Freeze? - -@alculquicondor: - -> Better, how about we keep on top of flakes even if it's not Code Freeze? -> -> Absolutely, but it's not easy to enforce without hurting development of non-violators. - -@jberkus: - -> Shutting down merges is the nuclear option for preventing flakes and fails. We should be able to keep on top of them without resorting to that. But ... we're getting off topic here, unless folks think the "increased time to flake" is a blocker for this (I don't). - -@johnbelamaric: - -> I am tentatively in favor of 3 releases per year, primarily because I believe 4 releases per year is too hard for folks to consume. Even 3 releases per year is probably too much for most, but the downsides of fewer releases make anything less than 3 too risky in my mind. -> -> As I see it, those downsides are some things already mentioned above: -> -> Build up of too much content in the release, and consequent potential for more painful upgrades. -> Very long lead time to get a feature to GA through the alpha/beta/stable phases. -> -> Before making this decision, I think we need mitigations for these. Those mitigations have extensive ripples in how we do our development. -> -> For (1), some mitigations are: -> a) More development out-of-tree / decoupling more components -> b) SIGs saying "no" more -> c) Stricter admission criteria in the release (higher bar from SIG Release, SIG Testing, PRR, WG Reliability, SIG Scalability, etc.) -> -> Of course some of these mitigations might make (2) worse. Other ideas? -> -> For (2), we may want to thinking about how we do our feature graduation process. Having three stages to go through and one less release per year to do it will stretch out how long it takes to add a feature quite substantially. This is a big topic we wouldn't want to gate on, but we may want to have a plan for before moving forward. Some options others have mentioned to me: -> a) Features are in, or out. That is, straight to GA, but with features not admitted to the main build until they are ready. This means we need some alternate build and perhaps dev or feature branches. -> b) Two stages instead of three. I am very skeptical of this, but there is some support for this in how K8s is actually used today. We did an operator survey in the PRR subproject and we found that: -> -> More than 90% of surveyed orgs allow (by policy) all Beta and GA features in prod. -> Less than 10% of surveyed orgs have ever disabled a beta feature in prod. The caveat is that both operators with more than 10,000 nodes under management that answered the survey have done this. -> -> This would indicate that people already treat beat as GA, for the most part. That's not necessarily a good thing, but it is a fact. Of course, the big benefit for us as contributors is that betas can be changed if we have made a mistake. So again we probably wouldn't want to use the current bar for beta and just map it to GA. We would need to raise that quite a bit. -> -> With respect to alpha, the idea there is to gather feedback. I think we get some limited feedback with it, but is it enough? If alpha and beta are not really serving their intended purpose, are they really that useful, as currently defined? -> -> Another option for (2) is breaking up the monolith more and allowing components to release independently. However, this could make test coverage nearly impossible, as individual components would need to be tested in various versions. Given that K8s is operated independently by thousands of organizations, I don't think we can treat the core components as completely independent. Nonetheless there may be some opportunities for decomposition (like we did with CoreDNS, for example). @thockin mentioned kubelet and kube-proxy in this regard. -> -> See also (there are probably many more issues like these): -> -> kubernetes/community#567 -> kubernetes/community#4000 - -@jberkus: - -> For (2), we may want to thinking about how we do our feature graduation process. Having three stages to go through and one less release per year to do it will stretch out how long it takes to add a feature quite substantially. -> -> Does it really, though? How many features actually went from alpha to GA in 9 months? -> -> Does anyone have hard data on this? - -@johnbelamaric: - -> Does it really, though? How many features actually went from alpha to GA in 9 months? -> -> Does anyone have hard data on this? -> -> Ok, that's a fair point to challenge that assumption. I agree probably most don't do it in 9 months, but data would help. I am not sure if "months" is the right measure, though. My concern is that people will still take the same number of releases to make it happen, which means it will take longer. -> -> On a similar note, I am curious if there is any data on the amount of feedback alpha features actually get. - -@jberkus: - -> Yah, and "Is there some outstanding blocker that prevents new features from actually going from alpha to GA in 3 releases for any reason other than maturity?" -> -> That is, if a feature can go from alpha to GA in 3 releases, that's fine. That's a year, and do we really want to make the case that it should take less than a year to get a new feature to GA? BUT ... if something in our process means that features realistically can't be introduced in 1.23 and go to beta in 1.24, then we have a potential problem, because that timeline gets very long if it's gonna actually take you 5 releases. - -@johnbelamaric: - -> The two points I bring up are in tension with each other. That is, the same -features spread across fewer annual releases automatically means more per -release, or longer duration in the pipeline. - -@bowei: - -> We should make sure that there are sufficient improvements/metrics/goals to meet w/ any change (or no change). -> It wouldn't be great if 4 -> 3 didn't improve things and the same rationales would justify 3 -> 2. -> Is there a bar where we can comfortably go back from 3 -> 4? - -@onlydole: - -> +1 for three releases a year, and all of this discussion is fantastic! -> -> I agree with there being three releases a year. However, I do think that having more regular minor version releases would be helpful, so there isn’t any rush to get things into a specific release, nor a blocker around shipping bugfixes or improvements. -> -> I’d like to propose a strongly scoped path for Alpha, Beta, and GA features. I believe that allowing for a bit more leniency for Alpha and Beta code promotion and more stringent requirements for features before they make GA status. - -## Explanatory - -@MIhirMishra: - -> What is the need to decide in advance ? Release when it is ready for its level i.e. if it is ready for beta - release as beta and when ready for GA release as GA. -> More important is what is in the release than how frequently you are releasing. - -@johnbelamaric: - -> What is the need to decide in advance ? Release when it is ready for its level i.e. if it is ready for beta - release as beta and when ready for GA release as GA. -> -> Not sure I understand the question. No one is suggesting deciding in advance - features will be advanced in their stage when they are ready. But the thing is, "when they are ready" depends on the release cadence. In order to go to beta, a feature has to have at least one alpha. In fact, realistically there will be more than one release with it in alpha, since it's really difficult to get meaningful feedback with a single cycle of alpha. Arguably, this becomes a little easier with longer time between releases, but realistically going from alpha release, to availability in downstream products, to real usage, to feedback and updated design and development is pretty hard to squeeze in before code freeze for the next release. -> -> Another way to think about this is that every feature goes through three state transitions: -> -> inception -> alpha -> alpha -> beta -> beta -> GA -> -> Thus, the minimum number of releases to get from inception to GA is three - about 9 months now versus 12 with the proposed schedule. Now, it is the rare feature that would be able to do this in 9 months, because we general need more than one release of alpha, and probably for beta too to get a decent signal on quality. -> -> At the same time, a constant level of effort exerted on K8s would mean that the same number of features could have state transitions in the same amount of time. With fewer releases per year, that means more state transitions per release. -> -> That is, elongating the cycle creates: more content (transitions) per release, and longer time for a given feature to transition through all the states. Higher latency (in terms of time) with more throughput (in terms of releases). -> -> We don't want higher latency with more throughput. Because more throughput means riskier and more difficult upgrades and higher latency means more pain for developers and their customers. -> -> So, what are the mitigations? They amount to reducing the number of transitions per release (to address my (1) above) and reducing the number of transitions per feature (to address my (2) above). -> -> Reducing the transitions per release can be done by: -> -> peeling features off of the monolithic release by pushing them out-of-tree (decomposition) -> SIGs saying "no" more (which pushes things out-of-tree, most likely) -> Requiring a higher bar for a state transition, thus making the effort involved to get to the next stage higher -> -> Reducing the transitions per feature can only be done by changing our policy. In order to do that, we would certainly need to raise the bar higher for state transition - this dovetails with our goal per release. Another possibility is to classify features into low and high risk features. Low risk features could go straight to beta and skip the alpha phase, for example. - -## Data - -@jberkus: - -> All: I'm going to research what actual feature trajectory looks like through Kubernetes, because @johnbelamaric has identified that as a critical question. Stats to come. - -@ehashman: - -> @jberkus Any updates on the stats? :) - -@jberkus: - -> Nope, got bogged down with other issues, and the question of "what is a feature" in Kubernetes turns out to be a hard one to answer. We don't actually track features outside of a single release cycle; we track KEPs, which can either be part of a feature or the parent of several features, but don't match up 1:1 as features. So first I need to invent a way to identify "features" in a way that works for multiple release cycles. - -@jberkus - -> Sorry this has been forever, but answering the question of "how fast do our features advance" turns out to be really hard, because there is literally no object called a "feature" that persists reliably through multiple releases. -> -> To reduce the scope of the problem, I decided to limit this to tracked Enhancements introduced as alpha in 1.12 or 1.13, which were particularly fruitful releases for new features. Limiting it to Tracked kind of limits it to larger features, but I think these are the only ones required to go through alpha/beta/stable anyway (yes/no?). So, in 1.12 and 1.13: -> -> 20 new enhancements were introduced -> 7 did not follow a alpha/beta/stable path, mostly because the were removed or broken up into other features -> 2 are still beta -> 1 advanced in minimum time, that is 1 release alpha, 1 beta, then stable, in 9 months -> 4 advanced from alpha to beta in 1 release, but then took 2 or more releases to go to stable -> 7 advanced more slowly -> -> Our median number of releases for an enhancement to progress is: -> -> Alpha to Beta: 2 releases -> Beta to Stable: 3 releases -> Alpha to Stable: 6 releases -> -> Given this, it does not look like moving to 3 releases a year would slow down feature development due to the alpha/beta/stable progression requirements. -> -> I will note, however, that for many enhancements nagging by the Release Team during each release cycle did provide a goad to resume stalled development. So I'm at least a little concerned that if we make 3/year and don't change how we monitor feature progression at all, it will still slow things down because devs are being nagged less often. - -@ehashman - -> I am a little less worried about the "nag factor" now that we've moved to push-driven enhancements this release (SIG leads track, enhancements team accepts vs. enhancements team tracks). - -@jberkus - -> I'm very worried about it, see new thread. - -@johnbelamaric - -> I am more concerned about the "nag factor" due to the move to push-driven -> development; I think the lack of nagging will slow some features down. -> However, there are new things at play that will help with it - namely, the -> "no more permabeta" where things can't linger in beta forever because their -> API gets turned off automatically. At least for things with APIs. -> -> I strongly believe our alpha-to-stable latency, already quite high, will -> get worse with 3 releases per year. But ultimately, it's up to the feature -> authors. If they want it to go fast enough, they'll have to push for it -> more. Missing a train will have higher cost. -> -> Anyway, I personally am, as I said before, ambivalent on this decision. -> Putting on my GKE hat, it makes my life easier. Putting on my OSS hat, I -> have concerns but nothing that would make me strongly oppose it. It's not a -> change I would push for, but I think it's reasonable to see what happens if -> we try it. -> -> And thank you Josh for the analysis. The median 6 releases goes from 18 -> months to 24 months, which is not great but also something that is not -> forced on feature authors - they could push and get it done in less time if -> they need to. It's a rare feature that was making it in the 9 months -> before, so it would be a rare feature that is forced to have a longer cycle -> than they would have otherwise. - -@jberkus - -> I'm going to start a new thread on nag factors, because these are a bigger deal than I think folks want to address. -> -> There are two areas in the project, currently, that are almost entirely dependent on release team nagging (herafter RTN) for development: -> -> Getting features to GA (and to a lesser degree, to beta) -> Fixing test failures and flakes -> -> With the current 3-month cycle, this means that for 1 month of every cycle RTN doesn't happen, and as a result, these two activities don't happen. This is an extremely broken system. Contributors should not be dependent on nagging of any sort to take care of these project priorities, and the project as a whole shouldn't be depending on the RT for them except during Code Freeze. -> -> A 4-month cycle will make this problem worse, because we'll be looking a 2 months of every cycle where RTN won't happen, a doubling of the amount of time per year for tests to fail and alpha features to be forgotten. -> -> I am not saying that this is a reason NOT to do a 4-month cycle. I am saying that switching to a 4-month cycle makes fixing our broken RTN-based system an urgent priority. Fixing failing tests needs to happen year round. Reviewing features for promotion needs to happen at every SIG meeting. -> -> (FWIW, this is an issue that every large OSS project faces, which is why Code Freeze is to awful in so many projects) From 5f121c9730b39a54a18ee1af8e0c217dd05ea96f Mon Sep 17 00:00:00 2001 From: Stephen Augustus Date: Mon, 15 Mar 2021 02:34:32 -0400 Subject: [PATCH 03/34] 2572-release-cadence: Start distilling user stories and requirements Signed-off-by: Stephen Augustus --- .../2572-release-cadence/README.md | 341 ++++++------------ 1 file changed, 117 insertions(+), 224 deletions(-) diff --git a/keps/sig-release/2572-release-cadence/README.md b/keps/sig-release/2572-release-cadence/README.md index aa2b32bd12c..ae17966f5e4 100644 --- a/keps/sig-release/2572-release-cadence/README.md +++ b/keps/sig-release/2572-release-cadence/README.md @@ -49,23 +49,30 @@ SIG Architecture for cross-cutting KEPs). - [Summary](#summary) - [Motivation](#motivation) - [FIXME](#fixme) + - [TODO Deterministic](#todo-deterministic) + - [TODO Reduce risk](#todo-reduce-risk) - [Data](#data) - [Goals](#goals) - [FIXME Does that mandate a fixed frequency?](#fixme-does-that-mandate-a-fixed-frequency) - [FIXME Releases don’t necessarily have to be equally spaced](#fixme-releases-dont-necessarily-have-to-be-equally-spaced) - [Non-Goals](#non-goals) + - [TODO Release Team](#todo-release-team) + - [TODO Enhancement graduation](#todo-enhancement-graduation) - [FIXME Ideas](#fixme-ideas) - [FIXME Comment, without decision](#fixme-comment-without-decision) - [FIXME Needs response](#fixme-needs-response) - [FIXME Explanatory](#fixme-explanatory) - [Proposal](#proposal) - [User Stories (Optional)](#user-stories-optional) - - [Story 1](#story-1) - - [Story 2](#story-2) - - [FIXME Support](#fixme-support) + - [TODO End User](#todo-end-user) + - [TODO Distributors and downstream projects](#todo-distributors-and-downstream-projects) + - [TODO Contributors](#todo-contributors) + - [TODO SIG Release members](#todo-sig-release-members) + - [TODO @neolit123](#todo-neolit123) - [Notes/Constraints/Caveats (Optional)](#notesconstraintscaveats-optional) - [Risks and Mitigations](#risks-and-mitigations) - [Design Details](#design-details) + - [TODO Schedule](#todo-schedule) - [FIXME Implementation Details](#fixme-implementation-details) - [Test Plan](#test-plan) - [Graduation Criteria](#graduation-criteria) @@ -191,6 +198,36 @@ I'd prefer three releases/year. /milestone v1.20 /priority important-longterm +#### TODO Deterministic + +@akutz: + +> I strongly believe a deterministic and known schedule is more important than the frequency itself. Slowing down to three releases, as @justaugustus said, will provide three additional months for triage and the addressing of existing issues. This should help us to better meet planned release dates as there, in theory, should be fewer unknown-unknowns. So a big +1 from me. + +#### TODO Reduce risk + +@Klaven: + +> I see some people attributing drift to longer release cycles (we are only talking about extending them by a month, not 3 months), but I would argue that fast release cycles have caused their own amount of drift, never mind the burden on the release team. +> +> Look at GKE, for example. Versions 1.14 to 1.17 are supported. GKE is arguably one of the best Kubernetes providers and there is a LOT of drift because corporations don't like continuous rapid change and find it hard to support. I also think that as a project matures the rate of change of the increasingly stable and feature-complete core should decrease. At some point the plugins and the out-of-tree projects should be where more change happens. Projects like cluster-api and the like get the attention which used to be focused on maturing the core. +> +> I know that recently there has been a lot of focus on how many releases something needs in order to become GA. I think that honestly is the wrong approach. +> +> I do think it's valid to be concerned that the release of k8s is too much work. I would say this means this system is too laborious. Given that it is this much work, rushing it more would probably only hurt us more. It's obvious that the ecosystem has already felt this strain. If we want to be able to release frequently, we need the release process to become painless. If we don't fix that problem, I don't see any solution other then pushing releases to a manageable cadence. +> +> If we want to release quickly, we need to think not only about the release team, but also the downstream; the adopters. If we want to release quickly and frequently, then we need to focus on making the upgrade process even easier and similar things. + +@ehashman: + +> While some folks in the thread note that this increases the heft/risk of each release, I actually think less releases will reduce risk. I'm speaking from an operations perspective as opposed to a development perspective. +> +> The current Kubernetes release cadence is so high that most organizations cannot keep up with making a major version update every 3 months regularly, or going out of security support in less than a year. While in theory, releasing more frequently reduces the churn and risk of each release, this is only true if end users are actually able to apply the upgrades. +> +> In my experience, this is very challenging and I have not yet seen any organization consistently keep up with the 3 month major upgrade pace for production clusters, especially at a large scale. So, what effectively happens is that end users upgrade less frequently than 3 months, but since that isn't supported, they end up in the situation where they are required to jump multiple major releases at once, which effectively results in much higher risk. +> +> 4 vs. 3 releases is >30% more release work, but I do not believe it provides benefit proportional to that work, nor does a quarterly major release cadence match the vast majority of operation teams' upgrade cycles. + #### Data @jberkus: @@ -308,6 +345,38 @@ What is out of scope for this KEP? Listing non-goals helps to focus discussion and make progress. --> +#### TODO Release Team + +@saschagrunert: + +> On the other hand it will give less people a chance to participate in the shadowing program for example. Anyways, I think we will find appropriate solutions around those kind of new challenges. + +@jeremyrickard: + +> The one downside is that we will remove an opportunity for shadowing, and as we saw this time around we had >100 people apply, and this will remove ~24-ish opportunities. I think we can maybe identify some opportunities for folks that want to be involved though. takes off release lead hat + +@wilsonehusin: + +> as someone who started getting involved in this project through shadowing release team, I'd like to echo what @saschagrunert & @jeremyrickard raised above regarding shadow opportunities -- I'm glad we're acknowledging the downside and hope we can keep in mind to present other opportunities for folks to get involved! + +@kcmartin: + +> As to the potential for limiting shadow opportunities (mentioned by @jeremyrickard, @wilsonehusin, and others), I'm definitely tuned in to that being a downside, since I've served as a SIG-Release shadow three times, and I think it's a fantastic opportunity! +> +> One possible way to alleviate that downside would be to have 5 shadows, instead of three or four, per sub-team. I believe this is still a manageable number for the Leads, and could distribute the work more evenly. + +@pires: + +> On a more personal note, (@jeremyrickard wink, wink) I applied for release shadow believing I'd be picked given my past contributions to the project and my justification to be selected over others. Being rejected was a humbling experience and I'm happy to let you know I didn't lose any of the appetite to contribute. Others may feel differently but, then again, the project is maturing and so should the community. + +#### TODO Enhancement graduation + +@jberkus: + +> @johnbelamaric everything you've said is valid. At the same time, though, my experience has been that the pressure goes the other way: features already linger in alpha or beta for way longer than they ought to. The push to get most features to GA -- or deprecate them -- really seems to be lacking. It's hard to pull stats for this, but most KEP-worthy features seem to take something like 2 years to get there. So from my perspective, more state changes per release would be a good thing (at least, more getting alpha features to beta/GA), even if we didn't change the number of releases per year. +> +> It's hard to tell whether or not switching to 3 releases a year would affect the slow pace of finishing features at all. + #### FIXME Ideas @sftim: @@ -400,140 +469,26 @@ the system. The goal here is to make this feel real for users without getting bogged down. --> -#### Story 1 - -#### Story 2 - -#### FIXME Support - -@markyjackson-taulia: - -> I am a +1 for 3 releases/year. As you noted, this will allow us to work on items that can enhance things - -@saschagrunert: - -> My personal opinion is that three releases per year are enough. This means more time can be spent on actual feature development, without narrowing them down just to fit into the release cycle. It will also reduce the management overhead for SIG Release (and the release engineering). -> -> On the other hand it will give less people a chance to participate in the shadowing program for example. Anyways, I think we will find appropriate solutions around those kind of new challenges. - -@mkorbi: - -> I don’t have a strong preference on the one or the other. But from what I -> see from our clients or even from public cloud providers, 4 release seems -> to be a huge struggle. -> Therefore -> +1 4 3 (you get joke in it) - -@timothysc: - -> Huge +1. In a series of user surveys from SIG Cluster lifecycle, we've consistently found that users struggle to keep up with 4 releases a year and have data to show this. -> -> cc @neolit123 - -@neolit123: - -> +1 -> -> Of the 709 votes, 59.1% preferred three releases over our current, non-2020 target of four. -> -> i find these results surprising. we had the same question in the latest SIG CL survey and the most picked answer was "not sure". this tells me that users possibly do not understand all the implications or it does not matter to them. -> -> a couple of benefits that i see from the developer side: -> -> with the yearly support KEP we get 3 releases to maintain -> less e2e test jobs -> -> as long as SIG Arch are on board we should just proceed with the change. -> -> EDIT: to Tim's point -> -> Huge +1. In a series of user surveys from SIG Cluster lifecycle, we've consistently found that users struggle to keep up with 4 releases a year and have data to show this. -> -> this however we did see, users struggle to upgrade. - -@jeremyrickard: - -> less churn for external consumers -> -> puts on external consumer hat This would be super great. We really, really struggle as is. takes off external consumer hat -> -> one less quarterly Release Team to recruit for -> -> puts on release lead hat This I am 100% in agreement on. When we kicked off 1.18 (because of the extra time over the new year and stuff), we had extra time to solicit shadows and as the enhancements lead for 1.18, I was really able to onboard my shadows early and hit the ground running. For 1.20, we didn't have a large amount of time between identifying an EA, 1.19 ending, and 1.20 starting so selecting the shadows and getting things rolling was a challenge, especially for the front loaded work Enhancements has to do. The one downside is that we will remove an opportunity for shadowing, and as we saw this time around we had >100 people apply, and this will remove ~24-ish opportunities. I think we can maybe identify some opportunities for folks that want to be involved though. takes off release lead hat -> -> I think in general, this gives people more time do planning within their SIG, work on KEPS that fall out of that planning, and maybe work toward general health and well being of tests? -> -> Probably a lot to work out if we make that decision, but overall I am pretty strongly in favor of this. - -@yahorse: - -> With everything going on three releases is fine, we should be considerate of the challenges everyone has right now. - -@frapposelli: - -> Big +1 on the 3 releases/year cadence. As the project matures, having less churn becomes a very desirable feature wink - -@ArangoGutierrez: - -> ++ for 3 releases a year - -@Klaven: - -> +1 for 3. +TODO: Add general note about state of the world and human challenges -@hasheddan: +#### TODO End User -> +1 for 3 releases +- Hard to keep up with four releases / too much churn + - TODO: Get data from SIG Cluster Lifecycle + - One quarter where teams cannot focus on infra work + - 16 weeks with 4 weeks for holiday buffer works well -@ameukam: - -> +1 for 3 releases a year. - -@savitharaghunathan: - -> Huge +1 for three releases a year. From an end user perspective it puts less toil on the teams to keep up with the upgrades and administrative work associated with it. - -@benhemp: - -> +1 -> -> 3 as a consumer team. because almost guaranteed my team will have one quarter where we have to focus elsewhere. Two or one my fear is the machine of getting things to the finish line gets rusty if exercised too infrequently. 16 weeks to ship, extra 4 weeks buffer for holidays for the consuming teams also works out pretty well. - -@wilsonehusin: - -> +1 for 3 releases/year, agree with what many have stated well regarding churn -> -> as someone who started getting involved in this project through shadowing release team, I'd like to echo what @saschagrunert & @jeremyrickard raised above regarding shadow opportunities -- I'm glad we're acknowledging the downside and hope we can keep in mind to present other opportunities for folks to get involved! - -@kcmartin: - -> 3 Releases/year, I am on board with this! -> -> This option seems to open up a lot of opportunity to keep the pace more reasonable, and keep folks from burning out too quickly! -> -> As to the potential for limiting shadow opportunities (mentioned by @jeremyrickard, @wilsonehusin, and others), I'm definitely tuned in to that being a downside, since I've served as a SIG-Release shadow three times, and I think it's a fantastic opportunity! -> -> One possible way to alleviate that downside would be to have 5 shadows, instead of three or four, per sub-team. I believe this is still a manageable number for the Leads, and could distribute the work more evenly. - -@bai: - -> Big +1 on having 3 releases/year. - -@LappleApple: - -> Absolutely +1 for three releases per year, for reasons already stated here. - -@palnabarun: - -> +3000 for 3 releases a year. heart_eyes_cat +@OmerKahani: -@recollir: +> 3 is the maximum upgrades that we can do in my company. Our customers are eCommerce merchants, so from September to December (include), we are in a freeze on all of the infrastructure. -> +1 for 3 releases. It will help downstream projects to be able to keep up. I mainly think on all “installers” out there, but also all other cluster add-ons. +#### TODO Distributors and downstream projects -@SayakMukhopadhyay: +https://www.cncf.io/certification/software-conformance/ -> +1 for 3 releases. It will also help some cloud providers playing catch-up to come to parity. +- Keeping up for both installers and cluster addons +- Cloud provider parity +- Less upgrades helps complex workloads @leodido: @@ -541,114 +496,42 @@ bogged down. > > Making users able to catch-up is more important than keeping a pace so fast that can lead nowhere (we experienced the same with https://github.com/falcosecurity/falco and we switched to 6 releases per year from 12). -@vincepri: - -> Big +100 for 3 releases / year cadence. - -@ncdc: - -> +1 to fewer releases / year. - -@fabriziopandini: - -> +1 to 3 releases - -@nader-ziada: - -> +1 on having 3 releases/year - -@oldthreefeng: - -> +1 to 3 releases / years . - -@cpanato: - -> The bot replied for me, but this is myself -> #1290 (comment) -> -> +1 for 3 releases a year - -@jackfrancis: - -> +1 on 3 releases per year -> -> Thanks @justaugustus! - -@akutz: - -> I strongly believe a deterministic and known schedule is more important than the frequency itself. Slowing down to three releases, as @justaugustus said, will provide three additional months for triage and the addressing of existing issues. This should help us to better meet planned release dates as there, in theory, should be fewer unknown-unknowns. So a big +1 from me. - -@pires: - -> +1 on 3 releases a year. -> -> And as noted over Twitter, given someone's concerns on expecting same amount of changes over 25% less releases, I think it's of paramount importance for SIGs to step up and limit the things they include in a release, balancing what matters short/ long-term and kicking out all that can be done outside of the release cycle (we have CRDs, custom API servers, scheduling plug-ins, and so on). Now, I understand it's hard, sometimes even painful, to manage the enthusiasm some like me have on things close to them they want to see gaining traction but the early days are gone and this is now a solid OSS project that requires mature contributors. I'll try and do my part. -> -> On a more personal note, (@jeremyrickard wink, wink) I applied for release shadow believing I'd be picked given my past contributions to the project and my justification to be selected over others. Being rejected was a humbling experience and I'm happy to let you know I didn't lose any of the appetite to contribute. Others may feel differently but, then again, the project is maturing and so should the community. - -@mpbarrett: - -> As more and more complex workloads move to Kube, having less upgrades in a single year is a good thing. Which months (ie April, August, Dec would be every 4 months from the beginning of a year) would be important for me as a user to know. - -@OmerKahani: - -> +1 for 3 -> -> 3 is the maximum upgrades that we can do in my company. Our customers are eCommerce merchants, so from September to December (include), we are in a freeze on all of the infrastructure. -> -> @tpepper for your question - the best month for us will be March, July, November-December. - -@tasdikrahman: - -> +1 on moving to 3 releases, this will definitely help us (end users) in keeping up with the release cadence, by reducing the toil and effort required by us to upgrade versions, if we move from 4 releases to 3. - -@xmudrii: - -> +1 for 3 releases a year. - @afirth: -> I guess most end users are blocked by their upstream distro's ability to keep up with the K8s release. For example, GKE rapid channel is currently on 1.18, but 1.19 released in August. Somebody previously mentioned kops has similar issues (also currently on 1.18). I'm curious whether this is because those providers routinely find issues, or because it takes some fixed time to implement the new capabilities and changes. Either way, I don't think this change would impact end user's ability to get new features in a timely fashion much. So, besides the reduced shadow capacity and the complexity of actually making the change, what are the downsides? -> -> +1 for me. +> I guess most end users are blocked by their upstream distro's ability to keep up with the K8s release. For example, GKE rapid channel is currently on 1.18, but 1.19 released in August. Somebody previously mentioned kops has similar issues (also currently on 1.18). I'm curious whether this is because those providers routinely find issues, or because it takes some fixed time to implement the new capabilities and changes. Either way, I don't think this change would impact end user's ability to get new features in a timely fashion much. -@Klaven: +#### TODO Contributors -> I see some people attributing drift to longer release cycles (we are only talking about extending them by a month, not 3 months), but I would argue that fast release cycles have caused their own amount of drift, never mind the burden on the release team. -> -> Look at GKE, for example. Versions 1.14 to 1.17 are supported. GKE is arguably one of the best Kubernetes providers and there is a LOT of drift because corporations don't like continuous rapid change and find it hard to support. I also think that as a project matures the rate of change of the increasingly stable and feature-complete core should decrease. At some point the plugins and the out-of-tree projects should be where more change happens. Projects like cluster-api and the like get the attention which used to be focused on maturing the core. -> -> I know that recently there has been a lot of focus on how many releases something needs in order to become GA. I think that honestly is the wrong approach. -> -> I do think it's valid to be concerned that the release of k8s is too much work. I would say this means this system is too laborious. Given that it is this much work, rushing it more would probably only hurt us more. It's obvious that the ecosystem has already felt this strain. If we want to be able to release frequently, we need the release process to become painless. If we don't fix that problem, I don't see any solution other then pushing releases to a manageable cadence. -> -> If we want to release quickly, we need to think not only about the release team, but also the downstream; the adopters. If we want to release quickly and frequently, then we need to focus on making the upgrade process even easier and similar things. -> -> As it is. I think quarterly is too much because of all the above issues. Sadly, I have not gotten as much time contributing to Kubernetes as I would like, so I will butt out of the conversation. I just wanted to put my thoughts down. +- Time for project enhancements +- Time for feature development +- Time for planning / KEPs +- Time for health and well-being of tests +- Time for mental health / curtailing burnout +- Time for KubeCon execution +- Further show of maturity with less churn -@yboujallab: +@pires: -> +1 for 3 releases / year +> And as noted over Twitter, given someone's concerns on expecting same amount of changes over 25% less releases, I think it's of paramount importance for SIGs to step up and limit the things they include in a release, balancing what matters short/ long-term and kicking out all that can be done outside of the release cycle (we have CRDs, custom API servers, scheduling plug-ins, and so on). Now, I understand it's hard, sometimes even painful, to manage the enthusiasm some like me have on things close to them they want to see gaining traction but the early days are gone and this is now a solid OSS project that requires mature contributors. -@jberkus: +#### TODO SIG Release members -> @johnbelamaric everything you've said is valid. At the same time, though, my experience has been that the pressure goes the other way: features already linger in alpha or beta for way longer than they ought to. The push to get most features to GA -- or deprecate them -- really seems to be lacking. It's hard to pull stats for this, but most KEP-worthy features seem to take something like 2 years to get there. So from my perspective, more state changes per release would be a good thing (at least, more getting alpha features to beta/GA), even if we didn't change the number of releases per year. -> -> It's hard to tell whether or not switching to 3 releases a year would affect the slow pace of finishing features at all. +- Reduce management overhead for SIG Release / Release Engineering +- With the yearly support KEP, we only have three 3 releases to maintain +- One less quarterly Release Team to recruit for -@ehashman: +##### TODO @neolit123 -> I was on leave for the past two weeks so chiming in a little late... -> -> I am +1 to 3 releases per year. +> Of the 709 votes, 59.1% preferred three releases over our current, non-2020 target of four. > -> While some folks in the thread note that this increases the heft/risk of each release, I actually think less releases will reduce risk. I'm speaking from an operations perspective as opposed to a development perspective. +> i find these results surprising. we had the same question in the latest SIG CL survey and the most picked answer was "not sure". this tells me that users possibly do not understand all the implications or it does not matter to them. > -> The current Kubernetes release cadence is so high that most organizations cannot keep up with making a major version update every 3 months regularly, or going out of security support in less than a year. While in theory, releasing more frequently reduces the churn and risk of each release, this is only true if end users are actually able to apply the upgrades. +> a couple of benefits that i see from the developer side: > -> In my experience, this is very challenging and I have not yet seen any organization consistently keep up with the 3 month major upgrade pace for production clusters, especially at a large scale. So, what effectively happens is that end users upgrade less frequently than 3 months, but since that isn't supported, they end up in the situation where they are required to jump multiple major releases at once, which effectively results in much higher risk. +> with the yearly support KEP we get 3 releases to maintain +> less e2e test jobs > -> 4 vs. 3 releases is >30% more release work, but I do not believe it provides benefit proportional to that work, nor does a quarterly major release cadence match the vast majority of operation teams' upgrade cycles. +> as long as SIG Arch are on board we should just proceed with the change. ### Notes/Constraints/Caveats (Optional) @@ -682,6 +565,16 @@ required) or even code snippets. If there's any ambiguity about HOW your proposal will be implemented, this is the place to discuss them. --> +### TODO Schedule + +@mpbarrett: + +> Which months (ie April, August, Dec would be every 4 months from the beginning of a year) would be important for me as a user to know. + +@OmerKahani: + +> @tpepper for your question - the best month for us will be March, July, November-December. + ### FIXME Implementation Details @tpepper: From 4c0f38c2c9a1d24cd436d42ddfa04b177fc07b0a Mon Sep 17 00:00:00 2001 From: Stephen Augustus Date: Mon, 15 Mar 2021 02:51:41 -0400 Subject: [PATCH 04/34] 2572-release-cadence: Move leads feedback into KEP sections Signed-off-by: Stephen Augustus --- .../2572-release-cadence/README.md | 134 +++++------------- 1 file changed, 38 insertions(+), 96 deletions(-) diff --git a/keps/sig-release/2572-release-cadence/README.md b/keps/sig-release/2572-release-cadence/README.md index ae17966f5e4..7a5524d254a 100644 --- a/keps/sig-release/2572-release-cadence/README.md +++ b/keps/sig-release/2572-release-cadence/README.md @@ -55,6 +55,8 @@ SIG Architecture for cross-cutting KEPs). - [Goals](#goals) - [FIXME Does that mandate a fixed frequency?](#fixme-does-that-mandate-a-fixed-frequency) - [FIXME Releases don’t necessarily have to be equally spaced](#fixme-releases-dont-necessarily-have-to-be-equally-spaced) + - [TODO Create data](#todo-create-data) + - [TODO Blocking upgrade tests](#todo-blocking-upgrade-tests) - [Non-Goals](#non-goals) - [TODO Release Team](#todo-release-team) - [TODO Enhancement graduation](#todo-enhancement-graduation) @@ -79,6 +81,7 @@ SIG Architecture for cross-cutting KEPs). - [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy) - [Version Skew Strategy](#version-skew-strategy) - [Implementation History](#implementation-history) + - [Leads meeting feedback](#leads-meeting-feedback) - [Drawbacks](#drawbacks) - [Alternatives](#alternatives) - [FIXME](#fixme-1) @@ -87,15 +90,6 @@ SIG Architecture for cross-cutting KEPs). - [No](#no) - [Maintenance releases](#maintenance-releases) - [Infrastructure Needed (Optional)](#infrastructure-needed-optional) -- [FIXME Cleanup](#fixme-cleanup) - - [How do we make a decision?](#how-do-we-make-a-decision) - - [Canonical](#canonical) - - [Alternatives](#alternatives-1) - - [Do we have any data?](#do-we-have-any-data) - - [How do we implement?](#how-do-we-implement) - - [Conversations](#conversations) - - [Leads meeting feedback](#leads-meeting-feedback) - - [From Jeremy](#from-jeremy) ## Release Signoff Checklist @@ -204,6 +198,8 @@ I'd prefer three releases/year. > I strongly believe a deterministic and known schedule is more important than the frequency itself. Slowing down to three releases, as @justaugustus said, will provide three additional months for triage and the addressing of existing issues. This should help us to better meet planned release dates as there, in theory, should be fewer unknown-unknowns. So a big +1 from me. +TODO: Stick to schedule and just not cut a 4th release? + #### TODO Reduce risk @Klaven: @@ -338,6 +334,33 @@ As a consumer, I'd be looking for some predictability in the schedule. See point on predictability. +#### TODO Create data + +Elana: + +> additional ask: can we send out a real survey to end users + +Primarily anecdotal from SIG Release members, vendors, and end users. + +AI: + +- (to Elana) What kind of data specifically are we looking for? + - Who's the audience? End users or principals? + - Should we just do this all of the time post-release? +- (to Josh) What did we discover regarding feature trajectory? + +Thoughts: +I would want requesters to be very explicit about the kind of data we're interested in. +SIG Release and others can work on collection, but we need to make sure this isn't a continually moving target. + +We're also starting from a disadvantage trying to compare our status quo to something we haven't tried for a sustained period of time. + +#### TODO Blocking upgrade tests + +Aaron C: + +> Can we make upgrade jobs / tests blocking to make the upgrade between versions better + ### Non-Goals +### [Leads meeting](https://docs.google.com/document/d/1Jio9rEtYxlBbntF8mRGmj6Q1JAdzZ9fTDo3ru1HK_LI/edit#bookmark=id.val5alfdahlr) feedback + ## Drawbacks - -## FIXME Cleanup - -### How do we make a decision? - -#### Canonical - -Write a KEP - -- Seek approval from: - - SIG Release - - SIG Architecture - - SIG Testing - - (maybe) Steering -- Set a lazy consensus timeout - -#### Alternatives - -- A survey - - What would this need to contain to be effective? -- Vote to stakeholders (SIG leads + Steering) - - Is there precedence for this outside of elections? - - Should this include subproject owners? - - 1 (Strong disagree) - 5 (Strong agree) - -### Do we have any data? - -Primarily anecdotal from SIG Release members, vendors, and end users. - -AI: - -- (to Elana) What kind of data specifically are we looking for? - - Who's the audience? End users or principals? - - Should we just do this all of the time post-release? -- (to Josh) What did we discover regarding feature trajectory? - -Thoughts: -I would want requesters to be very explicit about the kind of data we're interested in. -SIG Release and others can work on collection, but we need to make sure this isn't a continually moving target. - -We're also starting from a disadvantage trying to compare our status quo to something we haven't tried for a sustained period of time. - -### How do we implement? - -TBD -AI: Expand - -Thoughts: -I feel like less is going to change in the process than people think. - -- Make the decision -- Set the schedule - -### Conversations - -#### [Leads meeting](https://docs.google.com/document/d/1Jio9rEtYxlBbntF8mRGmj6Q1JAdzZ9fTDo3ru1HK_LI/edit#bookmark=id.val5alfdahlr) feedback - -- Q: how are we making a decision? -- [comment]: 1.21 is the real EOY release as its scheduler covers december -- Do we have any data? -- [comment]: does sig-arch, sig-release, steering, etc own the final decision? -- [comment]: sep question of how we implement; does that mandate a fixed freq? - - Stick to schedule and just not cut a 4th release? -- [comment]: releases don’t necessarily have to be equally spaced -- [comment]: cadence doesn't feel like things get the attention they deserve, things always feel rushed. Not a lot of space for people to take a step back. -- [comment]: should upgrade testing improve? Be blocking? - -#### From Jeremy - -John B: - -> Ask in the meeting, how are we going to make the actual decision for 3 vs 4? -> are we going to vote? or will SIG Release just make the decision? - -Elana: - -> additional ask: can we send out a real survey to end users - -Daniel: - -> the concern about things "taking longer" to go stable because of # of releases in beta also came up again, can we think of a way to handle this? -> Do the three releases need to be evenly spaced? - -Aaron C: - -> Can we get more "implementation" details about how three releases would word? -> Can we make upgrade jobs / tests blocking to make the upgrade between versions better From ef250f78dcc51d5063f7afe8569d6a6b5234b95c Mon Sep 17 00:00:00 2001 From: Stephen Augustus Date: Mon, 15 Mar 2021 03:06:06 -0400 Subject: [PATCH 05/34] 2572-release-cadence: Create section about concentrated risk Signed-off-by: Stephen Augustus --- .../2572-release-cadence/README.md | 84 ++++++++++--------- 1 file changed, 43 insertions(+), 41 deletions(-) diff --git a/keps/sig-release/2572-release-cadence/README.md b/keps/sig-release/2572-release-cadence/README.md index 7a5524d254a..f86d8484aaf 100644 --- a/keps/sig-release/2572-release-cadence/README.md +++ b/keps/sig-release/2572-release-cadence/README.md @@ -73,6 +73,7 @@ SIG Architecture for cross-cutting KEPs). - [TODO @neolit123](#todo-neolit123) - [Notes/Constraints/Caveats (Optional)](#notesconstraintscaveats-optional) - [Risks and Mitigations](#risks-and-mitigations) + - [TODO Concentrating risk](#todo-concentrating-risk) - [Design Details](#design-details) - [TODO Schedule](#todo-schedule) - [FIXME Implementation Details](#fixme-implementation-details) @@ -87,7 +88,6 @@ SIG Architecture for cross-cutting KEPs). - [FIXME](#fixme-1) - [LTS](#lts) - [Go faster](#go-faster) - - [No](#no) - [Maintenance releases](#maintenance-releases) - [Infrastructure Needed (Optional)](#infrastructure-needed-optional) @@ -527,6 +527,10 @@ https://www.cncf.io/certification/software-conformance/ > I guess most end users are blocked by their upstream distro's ability to keep up with the K8s release. For example, GKE rapid channel is currently on 1.18, but 1.19 released in August. Somebody previously mentioned kops has similar issues (also currently on 1.18). I'm curious whether this is because those providers routinely find issues, or because it takes some fixed time to implement the new capabilities and changes. Either way, I don't think this change would impact end user's ability to get new features in a timely fashion much. +@sebgoa: + +> Users and even cloud providers seem to struggle to keep up with the releases (e.g 1.18 is not yet available on GKE for instance), so this also seems to indicate that less releases would ease the work of users and providers. + #### TODO Contributors - Time for project enhancements @@ -547,6 +551,10 @@ https://www.cncf.io/certification/software-conformance/ - With the yearly support KEP, we only have three 3 releases to maintain - One less quarterly Release Team to recruit for +@sebgoa: + +> The kubernetes releases have been a strong point of the software since its inception. The quality, testing and general care has been amazing and only improved (my point of reference is releases of some apache foundation software). With the increased usage, scrutiny and complexity of the software it feels like each release is a huge effort for the release team so naturally less releases could mean a bit less work. + ##### TODO @neolit123 > Of the 709 votes, 59.1% preferred three releases over our current, non-2020 target of four. @@ -583,6 +591,40 @@ How will UX be reviewed, and by whom? Consider including folks who also work outside the SIG or subproject. --> +#### TODO Concentrating risk + +@adrianotto: + +> -1 +> +> I acknowledge this proposed change will not slow the rate of change, but it does concentrate risk. It means that each release would carry more change, and more risk. It also means that adoption of those features will be slower, and that's bad for users. +> +> Release early and release often. This philosophy is a key reason k8s matured as quickly as it did. I accept that 2020 is a strange year, and should be handled as such. That is not a valid reason to change what is done in subsequent years. Each time you make a change like this, it has a range of unintended consequences, such as the risk packing I mentioned above. It would be tragic to slow overall slowdown in the promotion of GA features because they transition based on releases, not duration in use. If the release process is burdensome, we should be asking how we can apply our creativity to make it easier, and reducing the release frequency might be one of several options. But asking the question this way constrains us from looking at the bigger picture, and fully considering what will serve the community best. + +@bowei: + +> Echoing Adrian's comment: +> +> I think releases are a nice forcing function towards stabilization and having less releases will increase drift in the extra time. +> Are we coupling feature(s) stabilization to release cadence too much? +> One fear is that the work simply going to be pushed rather than decrease, but now there are fewer "stabilization" points in the year. + +@spiffxp: + +> I'm a net -1 on 3 releases per year, but I understand I'm in the minority. Reducing the frequency of a risky/painful process does not naturally lead to a net reduction of pain or risk, and usually incentivizes increased risk. "Stabilize the patient" can be a good first step, but is insufficient on its own. +> +> To @tpepper's question of implementation, if we go with 3 symmetric releases, I would suggest using the "extra time" as a tech debt / process debt paydown phase at the beginning of each release cycle. Somewhat like how we left the milestone restriction in place at the beginning of the 1.20 release cycle. This would provide opportunity to pay down tech debt / process debt that involves large refactoring or breaking changes, the sort of work that is actively discouraged during the code freeze leading up to a release. +> +> I may have too narrow a view, but I have concerns that an April / August / December cadence puts undue pressure to land in August. I'm thinking of industries that typically go through a seasonal freeze in Q4. Shifting forward by a month (January / May / September) or two (February, June, October) may relieve some of that pressure, though it does cause one release to straddle Q4/Q1 in an awkward way. +> +> Another option is to declare Q4 what it has been in practice, a quieter time during which we're not actually going to push hard on a release, but I don't think that works as well with 3 releases vs. 4. + +@sebgoa: + +> But, generally speaking less releases (or less frequent minor releases) will also mean that each release will pack more weight, which means it will need even more testing and it will make upgrades tougher. +> +> With less releases developers will tend to rush their features at the last minute to "get it in" because the next one will be further apart. + ## Design Details @@ -507,24 +505,6 @@ Daniel: > the concern about things "taking longer" to go stable because of # of releases in beta also came up again, can we think of a way to handle this? -#### FIXME Ideas - -@sftim: - -> If there were an unsupported-but-tested Kubernetes release cut and published once a week - what would that mean? -> -> I'm imagining something that passes the conformance tests (little point otherwise) but comes with no guarantee. The Rust project has a model a bit like this with a daily unstable release which has nevertheless been through lots of automated testing. -> -> When I'm typing this I'm imagining that I could run minikube start --weekly-unstable and get a local test cluster based on the most recent release. If Kubernetes already had that built and working, would people pick different answers? - -@jberkus: - -> @sftim yeah, you've noticed that the reason, right now, we don't see a lot of community testing on alphas and betas is that we don't make them easy to consume. -> -> I'd say that it would need to go beyond that: we'd need images, minikube, and kubeadm releases for each weekly release. -> -> I don't know how that would affect our choice of major release cadence (isn't it orthagonal?) but it would be a great thing to do regardless. Also very hard. - #### FIXME Comment, without decision @sftim: @@ -964,9 +944,7 @@ not need to be as detailed as the proposal, but should include enough information to express the idea and why it was not acceptable. --> -### FIXME - -#### LTS +### TODO LTS @kellycampbell: @@ -992,7 +970,7 @@ information to express the idea and why it was not acceptable. > @youngnick you summed it up. Having 2 years of support for a specific API snapshot is unrealistic right now for all sorts of reasons, and it wasn't even clear that it was what people actually wanted. -#### Go faster +### TODO Go faster @sebgoa: @@ -1010,7 +988,23 @@ information to express the idea and why it was not acceptable. > > I don't think going from 4->3 releases will create this problem, though I do think going to 2 or 1 release would. We need some plan around the mitigations I described earlier though, to ensure we avoid this fate. -#### Maintenance releases +@sftim: + +> If there were an unsupported-but-tested Kubernetes release cut and published once a week - what would that mean? +> +> I'm imagining something that passes the conformance tests (little point otherwise) but comes with no guarantee. The Rust project has a model a bit like this with a daily unstable release which has nevertheless been through lots of automated testing. +> +> When I'm typing this I'm imagining that I could run minikube start --weekly-unstable and get a local test cluster based on the most recent release. If Kubernetes already had that built and working, would people pick different answers? + +@jberkus: + +> @sftim yeah, you've noticed that the reason, right now, we don't see a lot of community testing on alphas and betas is that we don't make them easy to consume. +> +> I'd say that it would need to go beyond that: we'd need images, minikube, and kubeadm releases for each weekly release. +> +> I don't know how that would affect our choice of major release cadence (isn't it orthagonal?) but it would be a great thing to do regardless. Also very hard. + +### TODO Maintenance releases @youngnick: From 52a828082508b79c08834097785e9ca80d84a798 Mon Sep 17 00:00:00 2001 From: Stephen Augustus Date: Mon, 15 Mar 2021 04:05:12 -0400 Subject: [PATCH 08/34] 2572-release-cadence: Move remaining FIXMEs to the appropriate headings Signed-off-by: Stephen Augustus --- .../2572-release-cadence/README.md | 111 ++++++++---------- 1 file changed, 49 insertions(+), 62 deletions(-) diff --git a/keps/sig-release/2572-release-cadence/README.md b/keps/sig-release/2572-release-cadence/README.md index 5d9fe472474..edc17932185 100644 --- a/keps/sig-release/2572-release-cadence/README.md +++ b/keps/sig-release/2572-release-cadence/README.md @@ -47,14 +47,11 @@ SIG Architecture for cross-cutting KEPs). - [Release Signoff Checklist](#release-signoff-checklist) - [Summary](#summary) -- [Motivation](#motivation) - - [FIXME](#fixme) - - [TODO Deterministic](#todo-deterministic) - - [TODO Reduce risk](#todo-reduce-risk) +- [TODO Motivation](#todo-motivation) - [Data](#data) - [Goals](#goals) - - [FIXME Does that mandate a fixed frequency?](#fixme-does-that-mandate-a-fixed-frequency) - - [FIXME Releases don’t necessarily have to be equally spaced](#fixme-releases-dont-necessarily-have-to-be-equally-spaced) + - [TODO Deterministic](#todo-deterministic) + - [TODO Reduce risk](#todo-reduce-risk) - [TODO Create data](#todo-create-data) - [TODO Blocking upgrade tests](#todo-blocking-upgrade-tests) - [TODO More automation](#todo-more-automation) @@ -63,9 +60,8 @@ SIG Architecture for cross-cutting KEPs). - [Non-Goals](#non-goals) - [TODO Release Team](#todo-release-team) - [TODO Enhancement graduation](#todo-enhancement-graduation) - - [FIXME Comment, without decision](#fixme-comment-without-decision) - - [FIXME Needs response](#fixme-needs-response) - - [FIXME Explanatory](#fixme-explanatory) + - [TODO Further decoupling core](#todo-further-decoupling-core) + - [TODO Modifying SIG Architecture policies](#todo-modifying-sig-architecture-policies) - [Proposal](#proposal) - [User Stories (Optional)](#user-stories-optional) - [TODO End User](#todo-end-user) @@ -154,7 +150,7 @@ updates. [documentation style guide]: https://github.com/kubernetes/community/blob/master/contributors/guide/style-guide.md --> -## Motivation +## TODO Motivation -### FIXME - What would you like to be added: We should formally discuss whether or not it's a good idea to modify the kubernetes/kubernetes release cadence. @@ -195,38 +189,6 @@ I'd prefer three releases/year. /milestone v1.20 /priority important-longterm -#### TODO Deterministic - -@akutz: - -> I strongly believe a deterministic and known schedule is more important than the frequency itself. Slowing down to three releases, as @justaugustus said, will provide three additional months for triage and the addressing of existing issues. This should help us to better meet planned release dates as there, in theory, should be fewer unknown-unknowns. So a big +1 from me. - -TODO: Stick to schedule and just not cut a 4th release? - -#### TODO Reduce risk - -@Klaven: - -> I see some people attributing drift to longer release cycles (we are only talking about extending them by a month, not 3 months), but I would argue that fast release cycles have caused their own amount of drift, never mind the burden on the release team. -> -> Look at GKE, for example. Versions 1.14 to 1.17 are supported. GKE is arguably one of the best Kubernetes providers and there is a LOT of drift because corporations don't like continuous rapid change and find it hard to support. I also think that as a project matures the rate of change of the increasingly stable and feature-complete core should decrease. At some point the plugins and the out-of-tree projects should be where more change happens. Projects like cluster-api and the like get the attention which used to be focused on maturing the core. -> -> I know that recently there has been a lot of focus on how many releases something needs in order to become GA. I think that honestly is the wrong approach. -> -> I do think it's valid to be concerned that the release of k8s is too much work. I would say this means this system is too laborious. Given that it is this much work, rushing it more would probably only hurt us more. It's obvious that the ecosystem has already felt this strain. If we want to be able to release frequently, we need the release process to become painless. If we don't fix that problem, I don't see any solution other then pushing releases to a manageable cadence. -> -> If we want to release quickly, we need to think not only about the release team, but also the downstream; the adopters. If we want to release quickly and frequently, then we need to focus on making the upgrade process even easier and similar things. - -@ehashman: - -> While some folks in the thread note that this increases the heft/risk of each release, I actually think less releases will reduce risk. I'm speaking from an operations perspective as opposed to a development perspective. -> -> The current Kubernetes release cadence is so high that most organizations cannot keep up with making a major version update every 3 months regularly, or going out of security support in less than a year. While in theory, releasing more frequently reduces the churn and risk of each release, this is only true if end users are actually able to apply the upgrades. -> -> In my experience, this is very challenging and I have not yet seen any organization consistently keep up with the 3 month major upgrade pace for production clusters, especially at a large scale. So, what effectively happens is that end users upgrade less frequently than 3 months, but since that isn't supported, they end up in the situation where they are required to jump multiple major releases at once, which effectively results in much higher risk. -> -> 4 vs. 3 releases is >30% more release work, but I do not believe it provides benefit proportional to that work, nor does a quarterly major release cadence match the vast majority of operation teams' upgrade cycles. - #### Data @jberkus: @@ -322,7 +284,15 @@ List the specific goals of the KEP. What is it trying to achieve? How will we know that this has succeeded? --> -#### FIXME Does that mandate a fixed frequency? +#### TODO Deterministic + +@akutz: + +> I strongly believe a deterministic and known schedule is more important than the frequency itself. Slowing down to three releases, as @justaugustus said, will provide three additional months for triage and the addressing of existing issues. This should help us to better meet planned release dates as there, in theory, should be fewer unknown-unknowns. So a big +1 from me. + +TODO: Stick to schedule and just not cut a 4th release? + +TODO: Does that mandate a fixed frequency? Thoughts: Roughly, yes. @@ -333,10 +303,34 @@ Roughly, yes. As a consumer, I'd be looking for some predictability in the schedule. -#### FIXME Releases don’t necessarily have to be equally spaced +TODO: Releases don’t necessarily have to be equally spaced See point on predictability. +#### TODO Reduce risk + +@Klaven: + +> I see some people attributing drift to longer release cycles (we are only talking about extending them by a month, not 3 months), but I would argue that fast release cycles have caused their own amount of drift, never mind the burden on the release team. +> +> Look at GKE, for example. Versions 1.14 to 1.17 are supported. GKE is arguably one of the best Kubernetes providers and there is a LOT of drift because corporations don't like continuous rapid change and find it hard to support. I also think that as a project matures the rate of change of the increasingly stable and feature-complete core should decrease. At some point the plugins and the out-of-tree projects should be where more change happens. Projects like cluster-api and the like get the attention which used to be focused on maturing the core. +> +> I know that recently there has been a lot of focus on how many releases something needs in order to become GA. I think that honestly is the wrong approach. +> +> I do think it's valid to be concerned that the release of k8s is too much work. I would say this means this system is too laborious. Given that it is this much work, rushing it more would probably only hurt us more. It's obvious that the ecosystem has already felt this strain. If we want to be able to release frequently, we need the release process to become painless. If we don't fix that problem, I don't see any solution other then pushing releases to a manageable cadence. +> +> If we want to release quickly, we need to think not only about the release team, but also the downstream; the adopters. If we want to release quickly and frequently, then we need to focus on making the upgrade process even easier and similar things. + +@ehashman: + +> While some folks in the thread note that this increases the heft/risk of each release, I actually think less releases will reduce risk. I'm speaking from an operations perspective as opposed to a development perspective. +> +> The current Kubernetes release cadence is so high that most organizations cannot keep up with making a major version update every 3 months regularly, or going out of security support in less than a year. While in theory, releasing more frequently reduces the churn and risk of each release, this is only true if end users are actually able to apply the upgrades. +> +> In my experience, this is very challenging and I have not yet seen any organization consistently keep up with the 3 month major upgrade pace for production clusters, especially at a large scale. So, what effectively happens is that end users upgrade less frequently than 3 months, but since that isn't supported, they end up in the situation where they are required to jump multiple major releases at once, which effectively results in much higher risk. +> +> 4 vs. 3 releases is >30% more release work, but I do not believe it provides benefit proportional to that work, nor does a quarterly major release cadence match the vast majority of operation teams' upgrade cycles. + #### TODO Create data Elana: @@ -505,7 +499,7 @@ Daniel: > the concern about things "taking longer" to go stable because of # of releases in beta also came up again, can we think of a way to handle this? -#### FIXME Comment, without decision +#### TODO Further decoupling core @sftim: @@ -513,21 +507,7 @@ Daniel: > > A bit more decoupling, now that the investment is made to enable that, sounds good to me - and allows for minor releases of Kubernetes itself to become less frequent. -#### FIXME Needs response - -@aojea: - -> 3 releases is cool for development, but not for releasing something with a minimum level of quality. -> We barely keep with the tech debt we have in CI and testing, ie, how many jobs are failing for years that nobody noticed?, how many bugs are open for years? how many features are in alpha,beta for years? -> Each release cycle force people to FIX things if they want to release, the more time to release the more technical debt that you accumulate. -> At least in all my life I never see a project that reducing the release cycle you don't end rushing everything for last week and honestly, I gave up believing that will be real some time. - -### FIXME Explanatory - -@MIhirMishra: - -> What is the need to decide in advance ? Release when it is ready for its level i.e. if it is ready for beta - release as beta and when ready for GA release as GA. -> More important is what is in the release than how frequently you are releasing. +#### TODO Modifying SIG Architecture policies @johnbelamaric: @@ -635,6 +615,13 @@ https://www.cncf.io/certification/software-conformance/ > And as noted over Twitter, given someone's concerns on expecting same amount of changes over 25% less releases, I think it's of paramount importance for SIGs to step up and limit the things they include in a release, balancing what matters short/ long-term and kicking out all that can be done outside of the release cycle (we have CRDs, custom API servers, scheduling plug-ins, and so on). Now, I understand it's hard, sometimes even painful, to manage the enthusiasm some like me have on things close to them they want to see gaining traction but the early days are gone and this is now a solid OSS project that requires mature contributors. +@aojea: + +> 3 releases is cool for development, but not for releasing something with a minimum level of quality. +> We barely keep with the tech debt we have in CI and testing, ie, how many jobs are failing for years that nobody noticed?, how many bugs are open for years? how many features are in alpha,beta for years? +> Each release cycle force people to FIX things if they want to release, the more time to release the more technical debt that you accumulate. +> At least in all my life I never see a project that reducing the release cycle you don't end rushing everything for last week and honestly, I gave up believing that will be real some time. + #### TODO SIG Release members - Reduce management overhead for SIG Release / Release Engineering From e02b8dd565d6de2472dab14c1acad25c838661f7 Mon Sep 17 00:00:00 2001 From: Stephen Augustus Date: Mon, 15 Mar 2021 04:13:47 -0400 Subject: [PATCH 09/34] 2572-release-cadence: Move feature graduation data to correct heading Signed-off-by: Stephen Augustus --- .../2572-release-cadence/README.md | 182 +++++++++--------- 1 file changed, 93 insertions(+), 89 deletions(-) diff --git a/keps/sig-release/2572-release-cadence/README.md b/keps/sig-release/2572-release-cadence/README.md index edc17932185..c88c90871c6 100644 --- a/keps/sig-release/2572-release-cadence/README.md +++ b/keps/sig-release/2572-release-cadence/README.md @@ -48,7 +48,6 @@ SIG Architecture for cross-cutting KEPs). - [Release Signoff Checklist](#release-signoff-checklist) - [Summary](#summary) - [TODO Motivation](#todo-motivation) - - [Data](#data) - [Goals](#goals) - [TODO Deterministic](#todo-deterministic) - [TODO Reduce risk](#todo-reduce-risk) @@ -60,6 +59,7 @@ SIG Architecture for cross-cutting KEPs). - [Non-Goals](#non-goals) - [TODO Release Team](#todo-release-team) - [TODO Enhancement graduation](#todo-enhancement-graduation) + - [Data](#data) - [TODO Further decoupling core](#todo-further-decoupling-core) - [TODO Modifying SIG Architecture policies](#todo-modifying-sig-architecture-policies) - [Proposal](#proposal) @@ -189,94 +189,6 @@ I'd prefer three releases/year. /milestone v1.20 /priority important-longterm -#### Data - -@jberkus: - -> All: I'm going to research what actual feature trajectory looks like through Kubernetes, because @johnbelamaric has identified that as a critical question. Stats to come. - -@ehashman: - -> @jberkus Any updates on the stats? :) - -@jberkus: - -> Nope, got bogged down with other issues, and the question of "what is a feature" in Kubernetes turns out to be a hard one to answer. We don't actually track features outside of a single release cycle; we track KEPs, which can either be part of a feature or the parent of several features, but don't match up 1:1 as features. So first I need to invent a way to identify "features" in a way that works for multiple release cycles. - -@jberkus - -> Sorry this has been forever, but answering the question of "how fast do our features advance" turns out to be really hard, because there is literally no object called a "feature" that persists reliably through multiple releases. -> -> To reduce the scope of the problem, I decided to limit this to tracked Enhancements introduced as alpha in 1.12 or 1.13, which were particularly fruitful releases for new features. Limiting it to Tracked kind of limits it to larger features, but I think these are the only ones required to go through alpha/beta/stable anyway (yes/no?). So, in 1.12 and 1.13: -> -> 20 new enhancements were introduced -> 7 did not follow a alpha/beta/stable path, mostly because the were removed or broken up into other features -> 2 are still beta -> 1 advanced in minimum time, that is 1 release alpha, 1 beta, then stable, in 9 months -> 4 advanced from alpha to beta in 1 release, but then took 2 or more releases to go to stable -> 7 advanced more slowly -> -> Our median number of releases for an enhancement to progress is: -> -> Alpha to Beta: 2 releases -> Beta to Stable: 3 releases -> Alpha to Stable: 6 releases -> -> Given this, it does not look like moving to 3 releases a year would slow down feature development due to the alpha/beta/stable progression requirements. -> -> I will note, however, that for many enhancements nagging by the Release Team during each release cycle did provide a goad to resume stalled development. So I'm at least a little concerned that if we make 3/year and don't change how we monitor feature progression at all, it will still slow things down because devs are being nagged less often. - -@ehashman - -> I am a little less worried about the "nag factor" now that we've moved to push-driven enhancements this release (SIG leads track, enhancements team accepts vs. enhancements team tracks). - -@jberkus - -> I'm very worried about it, see new thread. - -@johnbelamaric - -> I am more concerned about the "nag factor" due to the move to push-driven -> development; I think the lack of nagging will slow some features down. -> However, there are new things at play that will help with it - namely, the -> "no more permabeta" where things can't linger in beta forever because their -> API gets turned off automatically. At least for things with APIs. -> -> I strongly believe our alpha-to-stable latency, already quite high, will -> get worse with 3 releases per year. But ultimately, it's up to the feature -> authors. If they want it to go fast enough, they'll have to push for it -> more. Missing a train will have higher cost. -> -> Anyway, I personally am, as I said before, ambivalent on this decision. -> Putting on my GKE hat, it makes my life easier. Putting on my OSS hat, I -> have concerns but nothing that would make me strongly oppose it. It's not a -> change I would push for, but I think it's reasonable to see what happens if -> we try it. -> -> And thank you Josh for the analysis. The median 6 releases goes from 18 -> months to 24 months, which is not great but also something that is not -> forced on feature authors - they could push and get it done in less time if -> they need to. It's a rare feature that was making it in the 9 months -> before, so it would be a rare feature that is forced to have a longer cycle -> than they would have otherwise. - -@jberkus - -> I'm going to start a new thread on nag factors, because these are a bigger deal than I think folks want to address. -> -> There are two areas in the project, currently, that are almost entirely dependent on release team nagging (herafter RTN) for development: -> -> Getting features to GA (and to a lesser degree, to beta) -> Fixing test failures and flakes -> -> With the current 3-month cycle, this means that for 1 month of every cycle RTN doesn't happen, and as a result, these two activities don't happen. This is an extremely broken system. Contributors should not be dependent on nagging of any sort to take care of these project priorities, and the project as a whole shouldn't be depending on the RT for them except during Code Freeze. -> -> A 4-month cycle will make this problem worse, because we'll be looking a 2 months of every cycle where RTN won't happen, a doubling of the amount of time per year for tests to fail and alpha features to be forgotten. -> -> I am not saying that this is a reason NOT to do a 4-month cycle. I am saying that switching to a 4-month cycle makes fixing our broken RTN-based system an urgent priority. Fixing failing tests needs to happen year round. Reviewing features for promotion needs to happen at every SIG meeting. -> -> (FWIW, this is an issue that every large OSS project faces, which is why Code Freeze is to awful in so many projects) - ### Goals -### [Leads meeting](https://docs.google.com/document/d/1Jio9rEtYxlBbntF8mRGmj6Q1JAdzZ9fTDo3ru1HK_LI/edit#bookmark=id.val5alfdahlr) feedback +### Leads meeting feedback session + +Already captured above, but you can find meeting notes [here](https://docs.google.com/document/d/1Jio9rEtYxlBbntF8mRGmj6Q1JAdzZ9fTDo3ru1HK_LI/edit#bookmark=id.val5alfdahlr). ## Drawbacks From 233c2ccd040db0009332752a5ef56d8415b36608 Mon Sep 17 00:00:00 2001 From: Stephen Augustus Date: Mon, 15 Mar 2021 08:41:43 -0400 Subject: [PATCH 11/34] 2572-release-cadence: Mark Release Team shadow selection as out-of-scope Signed-off-by: Stephen Augustus --- .../2572-release-cadence/README.md | 34 +++++-------------- 1 file changed, 9 insertions(+), 25 deletions(-) diff --git a/keps/sig-release/2572-release-cadence/README.md b/keps/sig-release/2572-release-cadence/README.md index c48c4fa5a03..52992e681dd 100644 --- a/keps/sig-release/2572-release-cadence/README.md +++ b/keps/sig-release/2572-release-cadence/README.md @@ -57,11 +57,11 @@ SIG Architecture for cross-cutting KEPs). - [TODO Improve visibility](#todo-improve-visibility) - [TODO More policy](#todo-more-policy) - [Non-Goals](#non-goals) - - [TODO Release Team](#todo-release-team) - [TODO Enhancement graduation](#todo-enhancement-graduation) - [Data](#data) - [TODO Further decoupling core](#todo-further-decoupling-core) - [TODO Modifying SIG Architecture policies](#todo-modifying-sig-architecture-policies) + - [Determining an upper bound for Release Team shadows](#determining-an-upper-bound-for-release-team-shadows) - [Proposal](#proposal) - [User Stories (Optional)](#user-stories-optional) - [TODO End User](#todo-end-user) @@ -295,30 +295,6 @@ What is out of scope for this KEP? Listing non-goals helps to focus discussion and make progress. --> -#### TODO Release Team - -@saschagrunert: - -> On the other hand it will give less people a chance to participate in the shadowing program for example. Anyways, I think we will find appropriate solutions around those kind of new challenges. - -@jeremyrickard: - -> The one downside is that we will remove an opportunity for shadowing, and as we saw this time around we had >100 people apply, and this will remove ~24-ish opportunities. I think we can maybe identify some opportunities for folks that want to be involved though. takes off release lead hat - -@wilsonehusin: - -> as someone who started getting involved in this project through shadowing release team, I'd like to echo what @saschagrunert & @jeremyrickard raised above regarding shadow opportunities -- I'm glad we're acknowledging the downside and hope we can keep in mind to present other opportunities for folks to get involved! - -@kcmartin: - -> As to the potential for limiting shadow opportunities (mentioned by @jeremyrickard, @wilsonehusin, and others), I'm definitely tuned in to that being a downside, since I've served as a SIG-Release shadow three times, and I think it's a fantastic opportunity! -> -> One possible way to alleviate that downside would be to have 5 shadows, instead of three or four, per sub-team. I believe this is still a manageable number for the Leads, and could distribute the work more evenly. - -@pires: - -> On a more personal note, (@jeremyrickard wink, wink) I applied for release shadow believing I'd be picked given my past contributions to the project and my justification to be selected over others. Being rejected was a humbling experience and I'm happy to let you know I didn't lose any of the appetite to contribute. Others may feel differently but, then again, the project is maturing and so should the community. - #### TODO Enhancement graduation @johnbelamaric: @@ -488,6 +464,14 @@ Daniel: > > Reducing the transitions per feature can only be done by changing our policy. In order to do that, we would certainly need to raise the bar higher for state transition - this dovetails with our goal per release. Another possibility is to classify features into low and high risk features. Low risk features could go straight to beta and skip the alpha phase, for example. +#### Determining an upper bound for Release Team shadows + +It was noted that fewer releases for the year would lead to fewer opportunities +to participate on the Release Team. + +This will be discussed and eventually addressed in +https://github.com/kubernetes/sig-release/issues/1494. + ## Proposal @@ -464,6 +465,10 @@ Daniel: > > Reducing the transitions per feature can only be done by changing our policy. In order to do that, we would certainly need to raise the bar higher for state transition - this dovetails with our goal per release. Another possibility is to classify features into low and high risk features. Low risk features could go straight to beta and skip the alpha phase, for example. +#### Establishing maintenance/stability releases + +Discussed [here](#end-of-year-maintenancestability-releases). + #### Determining an upper bound for Release Team shadows It was noted that fewer releases for the year would lead to fewer opportunities @@ -981,21 +986,19 @@ information to express the idea and why it was not acceptable. > > I don't know how that would affect our choice of major release cadence (isn't it orthagonal?) but it would be a great thing to do regardless. Also very hard. -### TODO Maintenance releases - -@youngnick: - -> I agree with @spiffxp that whatever we end up doing, we should acknowledge that calendar Q4 is substantially quieter than other quarters, with US Kubecon rolling into US Thanksgiving, rolling into the December festive season. -> -> I think that any plan to change the release cadence needs to take that as a prime consideration, whether it's keeping four releases a year and marking the Q4 one as minimal features, spreading three releases across the year, or some other solution. - -@jberkus: +### End-of-year maintenance/stability releases -> @spiffxp we've talked about making Q4 a "maintenance" release endlessly, but we've never actually implemented that. +Establishing a shorter maintenance/stability release at the end of the year has +been casually discussed at several points in the project, with the most recent +(at the time of writing) occurrence being +[here](https://github.com/kubernetes/sig-release/issues/809). -@jayunit100: +Nothing compelling has emerged from previous conversations to give cause to +establish maintenance/stability releases. -> Sounds like joshs comment is middle ground on the way to three : sure you get 4 releases but the fourth is only bug fixes, tests and stability . +Fixing bugs, stabilizing components, adding/deflaking tests, improving +documentation, and graduating features are activities that can and should +happen in a reasonably consistent manner throughout the year. ## Infrastructure Needed (Optional) From f625168869146f1546cd5d35a0c070cbeab88f65 Mon Sep 17 00:00:00 2001 From: Stephen Augustus Date: Mon, 15 Mar 2021 10:02:26 -0400 Subject: [PATCH 13/34] 2572-release-cadence: Mark accelerated release cycles out-of-scope Signed-off-by: Stephen Augustus --- .../2572-release-cadence/README.md | 54 +++++++++---------- 1 file changed, 26 insertions(+), 28 deletions(-) diff --git a/keps/sig-release/2572-release-cadence/README.md b/keps/sig-release/2572-release-cadence/README.md index 7afa29b0065..9120d886690 100644 --- a/keps/sig-release/2572-release-cadence/README.md +++ b/keps/sig-release/2572-release-cadence/README.md @@ -61,6 +61,7 @@ SIG Architecture for cross-cutting KEPs). - [Data](#data) - [TODO Further decoupling core](#todo-further-decoupling-core) - [TODO Modifying SIG Architecture policies](#todo-modifying-sig-architecture-policies) + - [Accelerated release cycles](#accelerated-release-cycles) - [Establishing maintenance/stability releases](#establishing-maintenancestability-releases) - [Determining an upper bound for Release Team shadows](#determining-an-upper-bound-for-release-team-shadows) - [Proposal](#proposal) @@ -87,7 +88,7 @@ SIG Architecture for cross-cutting KEPs). - [Drawbacks](#drawbacks) - [Alternatives](#alternatives) - [TODO LTS](#todo-lts) - - [TODO Go faster](#todo-go-faster) + - [Releasing Kubernetes faster](#releasing-kubernetes-faster) - [End-of-year maintenance/stability releases](#end-of-year-maintenancestability-releases) - [Infrastructure Needed (Optional)](#infrastructure-needed-optional) @@ -465,6 +466,10 @@ Daniel: > > Reducing the transitions per feature can only be done by changing our policy. In order to do that, we would certainly need to raise the bar higher for state transition - this dovetails with our goal per release. Another possibility is to classify features into low and high risk features. Low risk features could go straight to beta and skip the alpha phase, for example. +#### Accelerated release cycles + +Discussed [here](#releasing-kubernetes-faster). + #### Establishing maintenance/stability releases Discussed [here](#end-of-year-maintenancestability-releases). @@ -952,39 +957,29 @@ information to express the idea and why it was not acceptable. > @youngnick you summed it up. Having 2 years of support for a specific API snapshot is unrealistic right now for all sorts of reasons, and it wasn't even clear that it was what people actually wanted. -### TODO Go faster +### Releasing Kubernetes faster -@sebgoa: +The intent of this proposal is to create more opportunities to provide a +high-value experience for Kubernetes consumers. -> IMHO with more releases, developers don't need to rush their features, upgrades a more bite size and it necessarily pushes for even more automation. -> -> So at the risk of being down voted I would argue that we have worked over the last 15 years to agree that "release early, release often" was a good idea, that creating a tight feedback loop with devs, testers and users was a very good idea. -> -> Theoretically we should have processes in place to be able to automatically upgrade and be able to handle even a higher cadence of releases. I could see a future were people don't upgrade that often because there are less releases and then start to fall behind one year, then two...etc. +The implication is that we as a community have a reasonable amount of tech debt +across infrastructure, testing, policy, and documentation that does not suggest +it would be feasible to spend more time releasing when we could be paying down +that debt. -@johnbelamaric: +SIG Release currently produces releases at the following cadence: -> I could see a future were people don't upgrade that often because there are less releases and then start to fall behind one year, then two...etc. -> -> Yes, this is a big fear of mine as well. We have worked hard to prevent vendor-based fragmentation (e.g., with conformance) and version-based fragmentation (with API round trip policies, etc). Bigger releases with riskier upgrades may undermine that work. We must avoid a Python2 -> 3 situation. This is also why we elected for a longer support cycle as opposed to an LTS. With the extensive ecosystem we have, fragmentation is extremely dangerous. -> -> I don't think going from 4->3 releases will create this problem, though I do think going to 2 or 1 release would. We need some plan around the mitigations I described earlier though, to ensure we avoid this fate. +- patch releases (`x.y.Z`): [monthly][patch-releases] +- minor releases (`x.Y.z`): [every four months][versioning] +- pre-releases (`x.y.0-(alpha|beta|rc).N`): every 1-3 weeks during active + development cycles ([example](https://git.k8s.io/sig-release/releases/release-1.21/README.md#timeline)) -@sftim: +At the time of writing, SIG Release considers these to be reasonable cadences +for patch and pre-releases. -> If there were an unsupported-but-tested Kubernetes release cut and published once a week - what would that mean? -> -> I'm imagining something that passes the conformance tests (little point otherwise) but comes with no guarantee. The Rust project has a model a bit like this with a daily unstable release which has nevertheless been through lots of automated testing. -> -> When I'm typing this I'm imagining that I could run minikube start --weekly-unstable and get a local test cluster based on the most recent release. If Kubernetes already had that built and working, would people pick different answers? - -@jberkus: - -> @sftim yeah, you've noticed that the reason, right now, we don't see a lot of community testing on alphas and betas is that we don't make them easy to consume. -> -> I'd say that it would need to go beyond that: we'd need images, minikube, and kubeadm releases for each weekly release. -> -> I don't know how that would affect our choice of major release cadence (isn't it orthagonal?) but it would be a great thing to do regardless. Also very hard. +If you'd like to provide feedback on longer-term improvements that maybe +accelerate production of releases, please join the discussion +[here](https://github.com/kubernetes/sig-release/discussions/1495) ### End-of-year maintenance/stability releases @@ -1007,3 +1002,6 @@ Use this section if you need things from the project/SIG. Examples include a new subproject, repos requested, or GitHub details. Listing these here allows a SIG to get the process for these resources started right away. --> + +[patch-releases]: https://git.k8s.io/sig-release/releases/patch-releases.md +[versioning]: https://git.k8s.io/sig-release/release-engineering/versioning.md From 15a626a94cb29a7419ca38dc78b94bf3bd2a94f9 Mon Sep 17 00:00:00 2001 From: Stephen Augustus Date: Tue, 16 Mar 2021 20:12:47 -0400 Subject: [PATCH 14/34] 2572-release-cadence: Mark LTS as out-of-scope Signed-off-by: Stephen Augustus --- .../2572-release-cadence/README.md | 38 ++++++++----------- 1 file changed, 15 insertions(+), 23 deletions(-) diff --git a/keps/sig-release/2572-release-cadence/README.md b/keps/sig-release/2572-release-cadence/README.md index 9120d886690..052bce322b8 100644 --- a/keps/sig-release/2572-release-cadence/README.md +++ b/keps/sig-release/2572-release-cadence/README.md @@ -57,6 +57,7 @@ SIG Architecture for cross-cutting KEPs). - [TODO Improve visibility](#todo-improve-visibility) - [TODO More policy](#todo-more-policy) - [Non-Goals](#non-goals) + - [Long-term support (LTS) releases](#long-term-support-lts-releases) - [TODO Enhancement graduation](#todo-enhancement-graduation) - [Data](#data) - [TODO Further decoupling core](#todo-further-decoupling-core) @@ -87,7 +88,7 @@ SIG Architecture for cross-cutting KEPs). - [Leads meeting feedback session](#leads-meeting-feedback-session) - [Drawbacks](#drawbacks) - [Alternatives](#alternatives) - - [TODO LTS](#todo-lts) + - [LTS](#lts) - [Releasing Kubernetes faster](#releasing-kubernetes-faster) - [End-of-year maintenance/stability releases](#end-of-year-maintenancestability-releases) - [Infrastructure Needed (Optional)](#infrastructure-needed-optional) @@ -297,6 +298,10 @@ What is out of scope for this KEP? Listing non-goals helps to focus discussion and make progress. --> +#### Long-term support (LTS) releases + +Discussed [here](#lts). + #### TODO Enhancement graduation @johnbelamaric: @@ -931,31 +936,17 @@ not need to be as detailed as the proposal, but should include enough information to express the idea and why it was not acceptable. --> -### TODO LTS - -@kellycampbell: - -> +1 One thing I'm wondering is how this would affect vendors and other downstream integrators. For example, we build our clusters using kops. It is normally at least one release behind the k8s releases. Would having an extra month help them sync up with the release? I imagine other integrators and cloud providers would also benefit from extra time to update. -> -> Additional point: we really are only able to upgrade our clusters about twice a year for various reasons not related to the k8s or kops release schedule. I see the maturing k8s as similar to OS upgrades such as Ubuntu which releases twice a year and have LTS every 4 releases or so. They are able to patch incrementally and continuously though. If k8s had a similar ability to apply incremental patches in a standard way such that 1.19.1 -> 1.19.2 is more or less automatic and not up to each vendor, that would be amazing. - -@chris-short: +### LTS -> I'm in favor of three releases a year. I like @jberkus comment about a Q4 maintenance release too without so much hoopla and fanfare. -> -> I hope folks aren't driven to only fix stuff in Q4, "Oh that's a gnarly one, wait until the end of the year." Is something I could foresee someone thinking at some point, if we don't word things right about a maintenance release. -> -> One question I think has been lightly touched on is, "What about LTS releases?" (and I know this is out of scope but, I don't know where we stand on this atm) - -@youngnick: +The LTS Working Group was [disbanded](https://github.com/kubernetes/community/pull/5240) +on October 20, 2020. -> The consensus on LTS (meaning multi-year support for a single version) is, in short, there's no consensus. We in the LTS WG worked for over two years, and we were able to get everyone to agree to extend the support window to one year (from nine months), which I think speaks to the passion that everyone has about this, the diversity of the use cases Kubernetes is supporting, and the community's determination to get it right. -> -> Speaking personally, I think that LTS is a long way away, if ever - it would require a lot more stability in the core than we have right now. With efforts like all the work to pull things out of tree, and the general movement towards adding new APIs outside of the core API group, I think it's plausible that one day, we may get to a place where we could consider it, but I don't think it's likely for some time, if ever. @tpepper, @jberkus, @LiGgit, and @dims among others may have thoughts here. :) - -@jberkus: +The outcome of their conversations was the proposal which established a +[yearly support period][yearly-support-kep] for minor releases of the project. -> @youngnick you summed it up. Having 2 years of support for a specific API snapshot is unrealistic right now for all sorts of reasons, and it wasn't even clear that it was what people actually wanted. +While we may revisit the idea in the future, for now, we trust the 2+ years of +thoughtful deliberation by the working group enough to conclude that the +project is not currently in a place to support long-term support releases. ### Releasing Kubernetes faster @@ -1005,3 +996,4 @@ SIG to get the process for these resources started right away. [patch-releases]: https://git.k8s.io/sig-release/releases/patch-releases.md [versioning]: https://git.k8s.io/sig-release/release-engineering/versioning.md +[yearly-support-kep]: /keps/sig-release/1498-kubernetes-yearly-support-period/README.md From e2d5d9692a8bd9022485446ff651327b928e0f19 Mon Sep 17 00:00:00 2001 From: Stephen Augustus Date: Wed, 24 Mar 2021 12:04:56 -0400 Subject: [PATCH 15/34] 2572-release-cadence: Group cleanup from the leadership team Signed-off-by: Stephen Augustus Co-authored-by: Sascha Grunert Co-authored-by: Lauri Apple Co-authored-by: Jeremy Rickard --- .../2572-release-cadence/README.md | 965 +++--------------- 1 file changed, 169 insertions(+), 796 deletions(-) diff --git a/keps/sig-release/2572-release-cadence/README.md b/keps/sig-release/2572-release-cadence/README.md index 052bce322b8..52a153b8196 100644 --- a/keps/sig-release/2572-release-cadence/README.md +++ b/keps/sig-release/2572-release-cadence/README.md @@ -47,51 +47,36 @@ SIG Architecture for cross-cutting KEPs). - [Release Signoff Checklist](#release-signoff-checklist) - [Summary](#summary) -- [TODO Motivation](#todo-motivation) +- [Motivation](#motivation) - [Goals](#goals) - - [TODO Deterministic](#todo-deterministic) - - [TODO Reduce risk](#todo-reduce-risk) - - [TODO Create data](#todo-create-data) - - [TODO Blocking upgrade tests](#todo-blocking-upgrade-tests) - - [TODO More automation](#todo-more-automation) - - [TODO Improve visibility](#todo-improve-visibility) - - [TODO More policy](#todo-more-policy) + - [Enhance determinism](#enhance-determinism) + - [Reduce risk](#reduce-risk) + - [Collecting data](#collecting-data) + - [Creating a policy](#creating-a-policy) - [Non-Goals](#non-goals) - [Long-term support (LTS) releases](#long-term-support-lts-releases) - - [TODO Enhancement graduation](#todo-enhancement-graduation) - - [Data](#data) - - [TODO Further decoupling core](#todo-further-decoupling-core) - - [TODO Modifying SIG Architecture policies](#todo-modifying-sig-architecture-policies) + - [Changing enhancements graduation](#changing-enhancements-graduation) + - [Architecture changes](#architecture-changes) + - [Modifying SIG Architecture policies](#modifying-sig-architecture-policies) - [Accelerated release cycles](#accelerated-release-cycles) - [Establishing maintenance/stability releases](#establishing-maintenancestability-releases) - [Determining an upper bound for Release Team shadows](#determining-an-upper-bound-for-release-team-shadows) - [Proposal](#proposal) - - [User Stories (Optional)](#user-stories-optional) - - [TODO End User](#todo-end-user) - - [TODO Distributors and downstream projects](#todo-distributors-and-downstream-projects) - - [TODO Contributors](#todo-contributors) - - [TODO SIG Release members](#todo-sig-release-members) - - [TODO @neolit123](#todo-neolit123) - - [Notes/Constraints/Caveats (Optional)](#notesconstraintscaveats-optional) + - [User Stories](#user-stories) + - [End User](#end-user) + - [Distributors and downstream projects](#distributors-and-downstream-projects) + - [Contributors](#contributors) + - [SIG Release members](#sig-release-members) - [Risks and Mitigations](#risks-and-mitigations) - - [TODO Concentrating risk](#todo-concentrating-risk) - - [TODO Attention to tests](#todo-attention-to-tests) - - [TODO Attention to dependencies](#todo-attention-to-dependencies) + - [Concentrating risk](#concentrating-risk) + - [Attention to tests](#attention-to-tests) + - [Attention to dependencies](#attention-to-dependencies) - [Design Details](#design-details) - - [TODO Schedule](#todo-schedule) - - [TODO Impact to existing deadlines](#todo-impact-to-existing-deadlines) - - [Test Plan](#test-plan) - - [Graduation Criteria](#graduation-criteria) - - [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy) - - [Version Skew Strategy](#version-skew-strategy) + - [Schedule Policy](#schedule-policy) - [Implementation History](#implementation-history) - [Leads meeting feedback session](#leads-meeting-feedback-session) - [Drawbacks](#drawbacks) - [Alternatives](#alternatives) - - [LTS](#lts) - - [Releasing Kubernetes faster](#releasing-kubernetes-faster) - - [End-of-year maintenance/stability releases](#end-of-year-maintenancestability-releases) -- [Infrastructure Needed (Optional)](#infrastructure-needed-optional) ## Release Signoff Checklist @@ -134,776 +119,230 @@ Items marked with (R) are required *prior to targeting to a milestone / release* ## Summary - - -## TODO Motivation - - - -What would you like to be added: - -We should formally discuss whether or not it's a good idea to modify the kubernetes/kubernetes release cadence. -Why is this needed: - -The extended release schedule for 1.19 will result in only three minor Kubernetes releases for 2020. - -As a result, we've received several questions across a variety of platforms and DMs about whether the project is intending to only have three minor releases/year. - -In an extremely scientific fashion, I took this question to a Twitter poll to get some initial feedback: https://twitter.com/stephenaugustus/status/1305902993095774210?s=20 - -Of the 709 votes, 59.1% preferred three releases over our current, non-2020 target of four. - -There's quite a bit of feedback to distill from that thread, so let's start aggregating opinions here. +With this KEP, SIG Release proposes to change the current Kubernetes release +cadence from 4 down to 3 releases per year. -Strictly my personal opinion: -I'd prefer three releases/year. +## Motivation - less churn for external consumers - one less quarterly Release Team to recruit for - for a lot of folx, there are usually multiple things happening at the quarterly boundary which a new Kubernetes release can steal focus from - we can collectively use the ~3 months we'd gain for things like triage/roadmap planning, personal downtime, stability efforts +Discussions around changing the release cadence for Kubernetes, which +currently releases 4 times per year, are ongoing in the community. -@kubernetes/sig-release @kubernetes/sig-architecture @kubernetes/sig-testing -/assign -/milestone v1.20 -/priority important-longterm +The extended release schedule for 1.19 resulted in only three minor +Kubernetes releases for 2020. As a result, SIG Release received several +questions across a variety of platforms and communication channels about whether +the project intends to only have three minor releases/year. ### Goals - - -#### TODO Deterministic - -@akutz: - -> I strongly believe a deterministic and known schedule is more important than the frequency itself. Slowing down to three releases, as @justaugustus said, will provide three additional months for triage and the addressing of existing issues. This should help us to better meet planned release dates as there, in theory, should be fewer unknown-unknowns. So a big +1 from me. - -TODO: Stick to schedule and just not cut a 4th release? - -TODO: Does that mandate a fixed frequency? - -Thoughts: -Roughly, yes. - -- Release cycle -- Planning / stability phase -- Repeat - -As a consumer, I'd be looking for some predictability in the schedule. - -TODO: Releases don’t necessarily have to be equally spaced - -See point on predictability. - -#### TODO Reduce risk +#### Enhance determinism -@Klaven: +With the current release cadence we already achieve a deterministic schedule for +every year. The goal of this KEP is to increase this even further by providing a +lightweight policy around creating the release schedule. Going down to 3 +releases provides additional room for triage, development, and explicit breaks, +which should result in better overall planning and more predictability. -> I see some people attributing drift to longer release cycles (we are only talking about extending them by a month, not 3 months), but I would argue that fast release cycles have caused their own amount of drift, never mind the burden on the release team. -> -> Look at GKE, for example. Versions 1.14 to 1.17 are supported. GKE is arguably one of the best Kubernetes providers and there is a LOT of drift because corporations don't like continuous rapid change and find it hard to support. I also think that as a project matures the rate of change of the increasingly stable and feature-complete core should decrease. At some point the plugins and the out-of-tree projects should be where more change happens. Projects like cluster-api and the like get the attention which used to be focused on maturing the core. -> -> I know that recently there has been a lot of focus on how many releases something needs in order to become GA. I think that honestly is the wrong approach. -> -> I do think it's valid to be concerned that the release of k8s is too much work. I would say this means this system is too laborious. Given that it is this much work, rushing it more would probably only hurt us more. It's obvious that the ecosystem has already felt this strain. If we want to be able to release frequently, we need the release process to become painless. If we don't fix that problem, I don't see any solution other then pushing releases to a manageable cadence. -> -> If we want to release quickly, we need to think not only about the release team, but also the downstream; the adopters. If we want to release quickly and frequently, then we need to focus on making the upgrade process even easier and similar things. +#### Reduce risk -@ehashman: +With higher predictability we can reduce the overall risk of changing the release +schedule. The planning overhead of SIG Release gets reduced, while users of +Kubernetes gain more time to adapt to the latest release. -> While some folks in the thread note that this increases the heft/risk of each release, I actually think less releases will reduce risk. I'm speaking from an operations perspective as opposed to a development perspective. -> -> The current Kubernetes release cadence is so high that most organizations cannot keep up with making a major version update every 3 months regularly, or going out of security support in less than a year. While in theory, releasing more frequently reduces the churn and risk of each release, this is only true if end users are actually able to apply the upgrades. -> -> In my experience, this is very challenging and I have not yet seen any organization consistently keep up with the 3 month major upgrade pace for production clusters, especially at a large scale. So, what effectively happens is that end users upgrade less frequently than 3 months, but since that isn't supported, they end up in the situation where they are required to jump multiple major releases at once, which effectively results in much higher risk. -> -> 4 vs. 3 releases is >30% more release work, but I do not believe it provides benefit proportional to that work, nor does a quarterly major release cadence match the vast majority of operation teams' upgrade cycles. +The current Kubernetes release cadence is so fast that most organizations cannot +keep up with regularly making a minor version update every 3 months or going out +of security support in a year. While releasing more frequently theoretically +reduces the churn and risk of each release, this is only true if end users are +actually able to apply the upgrades. -#### TODO Create data +#### Collecting data -Elana: +After this KEP is in place, SIG Release will follow up with a survey to collect +feedback about the new release cadence. -> additional ask: can we send out a real survey to end users +#### Creating a policy -Primarily anecdotal from SIG Release members, vendors, and end users. +The outcome of this KEP is a policy for creating release schedules for Kubernetes. +This allows the release team, as well as users, to follow a set of simple rules +when it comes to knowing when and how Kubernetes releases will be scheduled. -AI: - -- (to Elana) What kind of data specifically are we looking for? - - Who's the audience? End users or principals? - - Should we just do this all of the time post-release? -- (to Josh) What did we discover regarding feature trajectory? - -Thoughts: -I would want requesters to be very explicit about the kind of data we're interested in. -SIG Release and others can work on collection, but we need to make sure this isn't a continually moving target. - -We're also starting from a disadvantage trying to compare our status quo to something we haven't tried for a sustained period of time. +### Non-Goals -#### TODO Blocking upgrade tests +#### Long-term support (LTS) releases -Aaron C: +The LTS Working Group was +[disbanded](https://github.com/kubernetes/community/pull/5240) on October 20, +2020. -> Can we make upgrade jobs / tests blocking to make the upgrade between versions better +The outcome of their conversations was the proposal which established a +[yearly support period][/keps/sig-release/1498-kubernetes-yearly-support-period/README.md] +for minor releases of the project. -#### TODO More automation +While we may revisit the idea in the future, for now we trust the 2+ years of +thoughtful deliberation by the working group enough to conclude that the project +is not currently in a place to support long-term support releases. -@vincepri: +#### Changing enhancements graduation -> There is not enough automation (true for probably all our projects and repositories). +The way that enhancements are being graduated will not change with this KEP. It's +the responsibility of SIGs to keep track of their enhancements and +graduate them in the provided constraints of SIG Architecture. -#### TODO Improve visibility +The new release schedule will add room for only a few more weeks of development. +This means that SIGs can focus on using that time to enhance documentation and +testing (stability) over adding more features. -@vincepri: +Those decisions are not part of any SIG Release planning and will be considered +therefore as out of scope. -> Changes are hard to keep up with, and sometimes important things are buried in release notes. +#### Architecture changes -#### TODO More policy +Changing any architecture of Kubernetes, for example decoupling its core +components from the k/k repository, is out of scope of this KEP. -@vincepri: +#### Modifying SIG Architecture policies -> Some fear that less releases without policies (read: saying "no" more) isn't enough. +This non-goal corresponds partially to the [Changing enhancements +graduation](#changing-enhancements-graduation) section and broadens its scope +that any policy change made by SIG Architecture is out of scope of this KEP. -### Non-Goals +#### Accelerated release cycles - +The intent of this proposal is to create more opportunities to provide a +high-value experience for Kubernetes consumers. -#### Long-term support (LTS) releases +The implication is that we as a community have a reasonable amount of tech debt +across infrastructure, testing, policy, and documentation that does not suggest +it would be feasible to spend more time releasing when we could be paying down +that debt. -Discussed [here](#lts). +SIG Release currently produces releases at the following cadence: -#### TODO Enhancement graduation +- patch releases (`x.y.Z`): [monthly][https://git.k8s.io/sig-release/releases/patch-releases.md] +- minor releases (`x.Y.z`): [every four months][https://git.k8s.io/sig-release/release-engineering/versioning.md] +- pre-releases (`x.y.0-(alpha|beta|rc).N`): every 1-3 weeks during active + development cycles ([example](https://git.k8s.io/sig-release/releases/release-1.21/README.md#timeline)) -@johnbelamaric: +At the time of writing, SIG Release considers these to be reasonable cadences +for patch and pre-releases. -> I am tentatively in favor of 3 releases per year, primarily because I believe 4 releases per year is too hard for folks to consume. Even 3 releases per year is probably too much for most, but the downsides of fewer releases make anything less than 3 too risky in my mind. -> -> As I see it, those downsides are some things already mentioned above: -> -> Build up of too much content in the release, and consequent potential for more painful upgrades. -> Very long lead time to get a feature to GA through the alpha/beta/stable phases. -> -> Before making this decision, I think we need mitigations for these. Those mitigations have extensive ripples in how we do our development. -> -> For (1), some mitigations are: -> a) More development out-of-tree / decoupling more components -> b) SIGs saying "no" more -> c) Stricter admission criteria in the release (higher bar from SIG Release, SIG Testing, PRR, WG Reliability, SIG Scalability, etc.) -> -> Of course some of these mitigations might make (2) worse. Other ideas? -> -> For (2), we may want to thinking about how we do our feature graduation process. Having three stages to go through and one less release per year to do it will stretch out how long it takes to add a feature quite substantially. This is a big topic we wouldn't want to gate on, but we may want to have a plan for before moving forward. Some options others have mentioned to me: -> a) Features are in, or out. That is, straight to GA, but with features not admitted to the main build until they are ready. This means we need some alternate build and perhaps dev or feature branches. -> b) Two stages instead of three. I am very skeptical of this, but there is some support for this in how K8s is actually used today. We did an operator survey in the PRR subproject and we found that: -> -> More than 90% of surveyed orgs allow (by policy) all Beta and GA features in prod. -> Less than 10% of surveyed orgs have ever disabled a beta feature in prod. The caveat is that both operators with more than 10,000 nodes under management that answered the survey have done this. -> -> This would indicate that people already treat beat as GA, for the most part. That's not necessarily a good thing, but it is a fact. Of course, the big benefit for us as contributors is that betas can be changed if we have made a mistake. So again we probably wouldn't want to use the current bar for beta and just map it to GA. We would need to raise that quite a bit. -> -> With respect to alpha, the idea there is to gather feedback. I think we get some limited feedback with it, but is it enough? If alpha and beta are not really serving their intended purpose, are they really that useful, as currently defined? -> -> Another option for (2) is breaking up the monolith more and allowing components to release independently. However, this could make test coverage nearly impossible, as individual components would need to be tested in various versions. Given that K8s is operated independently by thousands of organizations, I don't think we can treat the core components as completely independent. Nonetheless there may be some opportunities for decomposition (like we did with CoreDNS, for example). @thockin mentioned kubelet and kube-proxy in this regard. -> -> See also (there are probably many more issues like these): -> -> kubernetes/community#567 -> kubernetes/community#4000 - -@jberkus: - -> For (2), we may want to thinking about how we do our feature graduation process. Having three stages to go through and one less release per year to do it will stretch out how long it takes to add a feature quite substantially. -> -> Does it really, though? How many features actually went from alpha to GA in 9 months? -> -> Does anyone have hard data on this? - -@johnbelamaric: - -> Does it really, though? How many features actually went from alpha to GA in 9 months? -> -> Does anyone have hard data on this? -> -> Ok, that's a fair point to challenge that assumption. I agree probably most don't do it in 9 months, but data would help. I am not sure if "months" is the right measure, though. My concern is that people will still take the same number of releases to make it happen, which means it will take longer. -> -> On a similar note, I am curious if there is any data on the amount of feedback alpha features actually get. - -@jberkus: - -> Yah, and "Is there some outstanding blocker that prevents new features from actually going from alpha to GA in 3 releases for any reason other than maturity?" -> -> That is, if a feature can go from alpha to GA in 3 releases, that's fine. That's a year, and do we really want to make the case that it should take less than a year to get a new feature to GA? BUT ... if something in our process means that features realistically can't be introduced in 1.23 and go to beta in 1.24, then we have a potential problem, because that timeline gets very long if it's gonna actually take you 5 releases. - -@johnbelamaric: - -> The two points I bring up are in tension with each other. That is, the same -> features spread across fewer annual releases automatically means more per -> release, or longer duration in the pipeline. - -@bowei: - -> We should make sure that there are sufficient improvements/metrics/goals to meet w/ any change (or no change). -> It wouldn't be great if 4 -> 3 didn't improve things and the same rationales would justify 3 -> 2. -> Is there a bar where we can comfortably go back from 3 -> 4? - -@onlydole: - -> +1 for three releases a year, and all of this discussion is fantastic! -> -> I agree with there being three releases a year. However, I do think that having more regular minor version releases would be helpful, so there isn’t any rush to get things into a specific release, nor a blocker around shipping bugfixes or improvements. -> -> I’d like to propose a strongly scoped path for Alpha, Beta, and GA features. I believe that allowing for a bit more leniency for Alpha and Beta code promotion and more stringent requirements for features before they make GA status. - -@jberkus: - -> @johnbelamaric everything you've said is valid. At the same time, though, my experience has been that the pressure goes the other way: features already linger in alpha or beta for way longer than they ought to. The push to get most features to GA -- or deprecate them -- really seems to be lacking. It's hard to pull stats for this, but most KEP-worthy features seem to take something like 2 years to get there. So from my perspective, more state changes per release would be a good thing (at least, more getting alpha features to beta/GA), even if we didn't change the number of releases per year. -> -> It's hard to tell whether or not switching to 3 releases a year would affect the slow pace of finishing features at all. - -Daniel: - -> the concern about things "taking longer" to go stable because of # of releases in beta also came up again, can we think of a way to handle this? - -##### Data - -@jberkus: - -> All: I'm going to research what actual feature trajectory looks like through Kubernetes, because @johnbelamaric has identified that as a critical question. Stats to come. - -@ehashman: - -> @jberkus Any updates on the stats? :) - -@jberkus: - -> Nope, got bogged down with other issues, and the question of "what is a feature" in Kubernetes turns out to be a hard one to answer. We don't actually track features outside of a single release cycle; we track KEPs, which can either be part of a feature or the parent of several features, but don't match up 1:1 as features. So first I need to invent a way to identify "features" in a way that works for multiple release cycles. - -@jberkus - -> Sorry this has been forever, but answering the question of "how fast do our features advance" turns out to be really hard, because there is literally no object called a "feature" that persists reliably through multiple releases. -> -> To reduce the scope of the problem, I decided to limit this to tracked Enhancements introduced as alpha in 1.12 or 1.13, which were particularly fruitful releases for new features. Limiting it to Tracked kind of limits it to larger features, but I think these are the only ones required to go through alpha/beta/stable anyway (yes/no?). So, in 1.12 and 1.13: -> -> 20 new enhancements were introduced -> 7 did not follow a alpha/beta/stable path, mostly because the were removed or broken up into other features -> 2 are still beta -> 1 advanced in minimum time, that is 1 release alpha, 1 beta, then stable, in 9 months -> 4 advanced from alpha to beta in 1 release, but then took 2 or more releases to go to stable -> 7 advanced more slowly -> -> Our median number of releases for an enhancement to progress is: -> -> Alpha to Beta: 2 releases -> Beta to Stable: 3 releases -> Alpha to Stable: 6 releases -> -> Given this, it does not look like moving to 3 releases a year would slow down feature development due to the alpha/beta/stable progression requirements. -> -> I will note, however, that for many enhancements nagging by the Release Team during each release cycle did provide a goad to resume stalled development. So I'm at least a little concerned that if we make 3/year and don't change how we monitor feature progression at all, it will still slow things down because devs are being nagged less often. - -#### TODO Further decoupling core - -@sftim: - -> I look forward to using the mechanisms already in place (notably CustomResourceDefinition, but also things like scheduling plugins) to enhance the Kubernetes experience outside of the minor release cycle. -> -> A bit more decoupling, now that the investment is made to enable that, sounds good to me - and allows for minor releases of Kubernetes itself to become less frequent. - -#### TODO Modifying SIG Architecture policies - -@johnbelamaric: - -> What is the need to decide in advance ? Release when it is ready for its level i.e. if it is ready for beta - release as beta and when ready for GA release as GA. -> -> Not sure I understand the question. No one is suggesting deciding in advance - features will be advanced in their stage when they are ready. But the thing is, "when they are ready" depends on the release cadence. In order to go to beta, a feature has to have at least one alpha. In fact, realistically there will be more than one release with it in alpha, since it's really difficult to get meaningful feedback with a single cycle of alpha. Arguably, this becomes a little easier with longer time between releases, but realistically going from alpha release, to availability in downstream products, to real usage, to feedback and updated design and development is pretty hard to squeeze in before code freeze for the next release. -> -> Another way to think about this is that every feature goes through three state transitions: -> -> inception -> alpha -> alpha -> beta -> beta -> GA -> -> Thus, the minimum number of releases to get from inception to GA is three - about 9 months now versus 12 with the proposed schedule. Now, it is the rare feature that would be able to do this in 9 months, because we general need more than one release of alpha, and probably for beta too to get a decent signal on quality. -> -> At the same time, a constant level of effort exerted on K8s would mean that the same number of features could have state transitions in the same amount of time. With fewer releases per year, that means more state transitions per release. -> -> That is, elongating the cycle creates: more content (transitions) per release, and longer time for a given feature to transition through all the states. Higher latency (in terms of time) with more throughput (in terms of releases). -> -> We don't want higher latency with more throughput. Because more throughput means riskier and more difficult upgrades and higher latency means more pain for developers and their customers. -> -> So, what are the mitigations? They amount to reducing the number of transitions per release (to address my (1) above) and reducing the number of transitions per feature (to address my (2) above). -> -> Reducing the transitions per release can be done by: -> -> peeling features off of the monolithic release by pushing them out-of-tree (decomposition) -> SIGs saying "no" more (which pushes things out-of-tree, most likely) -> Requiring a higher bar for a state transition, thus making the effort involved to get to the next stage higher -> -> Reducing the transitions per feature can only be done by changing our policy. In order to do that, we would certainly need to raise the bar higher for state transition - this dovetails with our goal per release. Another possibility is to classify features into low and high risk features. Low risk features could go straight to beta and skip the alpha phase, for example. +If you'd like to provide feedback on longer-term improvements that maybe +accelerate production of releases, please join the discussion +[here](https://github.com/kubernetes/sig-release/discussions/1495) -#### Accelerated release cycles +#### Establishing maintenance/stability releases -Discussed [here](#releasing-kubernetes-faster). +Establishing a shorter maintenance/stability release at the end of the year has +been casually discussed at several points in the project, with the most recent +(at the time of writing) occurrence being +[here](https://github.com/kubernetes/sig-release/issues/809). -#### Establishing maintenance/stability releases +Nothing compelling has emerged from previous conversations to give cause to +establish maintenance/stability releases. -Discussed [here](#end-of-year-maintenancestability-releases). +Fixing bugs, stabilizing components, adding/deflaking tests, improving +documentation, and graduating features are activities that can and should +happen in a reasonably consistent manner throughout the year. #### Determining an upper bound for Release Team shadows It was noted that fewer releases for the year would lead to fewer opportunities to participate on the Release Team. -This will be discussed and eventually addressed in +This will be discussed and addressed in https://github.com/kubernetes/sig-release/issues/1494. ## Proposal - - -### User Stories (Optional) - - - -TODO: Add general note about state of the world and human challenges - -#### TODO End User - -- Hard to keep up with four releases / too much churn - - TODO: Get data from SIG Cluster Lifecycle - - One quarter where teams cannot focus on infra work - - 16 weeks with 4 weeks for holiday buffer works well - -@OmerKahani: - -> 3 is the maximum upgrades that we can do in my company. Our customers are eCommerce merchants, so from September to December (include), we are in a freeze on all of the infrastructure. - -#### TODO Distributors and downstream projects - -https://www.cncf.io/certification/software-conformance/ - -- Keeping up for both installers and cluster addons -- Cloud provider parity -- Less upgrades helps complex workloads - -@leodido: - -> +1 for 3 releases. -> -> Making users able to catch-up is more important than keeping a pace so fast that can lead nowhere (we experienced the same with https://github.com/falcosecurity/falco and we switched to 6 releases per year from 12). - -@afirth: +### User Stories -> I guess most end users are blocked by their upstream distro's ability to keep up with the K8s release. For example, GKE rapid channel is currently on 1.18, but 1.19 released in August. Somebody previously mentioned kops has similar issues (also currently on 1.18). I'm curious whether this is because those providers routinely find issues, or because it takes some fixed time to implement the new capabilities and changes. Either way, I don't think this change would impact end user's ability to get new features in a timely fashion much. +Kubernetes releases are made by real people. The technical aspects, for example +the release automation, reflects only a tiny part of the complete cycle. This +means we will mainly focus on the human aspects and their corresponding roles +when deciding to move to a 3 releases per year cadence. -@sebgoa: +#### End User -> Users and even cloud providers seem to struggle to keep up with the releases (e.g 1.18 is not yet available on GKE for instance), so this also seems to indicate that less releases would ease the work of users and providers. +Most companies are facing issues upgrading Kubernetes 4 times a year. Providing +only 3 releases per year will relax this situation. -#### TODO Contributors +#### Distributors and downstream projects -- Time for project enhancements -- Time for feature development -- Time for planning / KEPs -- Time for health and well-being of tests -- Time for mental health / curtailing burnout -- Time for KubeCon execution -- Further show of maturity with less churn -- Saying "no" more +Downstream projects assemble their solution from different projects. Having +fewer upgrades helps them to reduce the complexity. For example cloud providers +will gain more room for upgrading their infrastructure. -@vincepri: +#### Contributors -> Taking a step back, a few people are suggesting to fix some of these problem from a technical perspective (which is good in itself) and we should prioritize these efforts. From the other side, there is a general sentiment that we need to slow down for the sake of this community's health. -> -> These are both valid, and agreeing on a slower cadence is just the first step; going forward we should normalize taking a few steps back, reflect, and course-correct when things are becoming unsustainable. +With a lower release cadence, contributors will gain more time for, project +enhancements, feature development, planning and testing. It will provide more +room for maintaining their mental health and prepare for events like KubeCon. -@pires: +Target of SIG Release with this proposal is to explicitly not push contributors +in doing more. It is about giving contributors more flexibility to decide how to +invest their time. -> And as noted over Twitter, given someone's concerns on expecting same amount of changes over 25% less releases, I think it's of paramount importance for SIGs to step up and limit the things they include in a release, balancing what matters short/ long-term and kicking out all that can be done outside of the release cycle (we have CRDs, custom API servers, scheduling plug-ins, and so on). Now, I understand it's hard, sometimes even painful, to manage the enthusiasm some like me have on things close to them they want to see gaining traction but the early days are gone and this is now a solid OSS project that requires mature contributors. +#### SIG Release members -@aojea: - -> 3 releases is cool for development, but not for releasing something with a minimum level of quality. -> We barely keep with the tech debt we have in CI and testing, ie, how many jobs are failing for years that nobody noticed?, how many bugs are open for years? how many features are in alpha,beta for years? -> Each release cycle force people to FIX things if they want to release, the more time to release the more technical debt that you accumulate. -> At least in all my life I never see a project that reducing the release cycle you don't end rushing everything for last week and honestly, I gave up believing that will be real some time. - -#### TODO SIG Release members - -- Reduce management overhead for SIG Release / Release Engineering -- With the yearly support KEP, we only have three 3 releases to maintain -- One less quarterly Release Team to recruit for - -@sebgoa: - -> The kubernetes releases have been a strong point of the software since its inception. The quality, testing and general care has been amazing and only improved (my point of reference is releases of some apache foundation software). With the increased usage, scrutiny and complexity of the software it feels like each release is a huge effort for the release team so naturally less releases could mean a bit less work. - -@jberkus: - -> I will note, however, that for many enhancements nagging by the Release Team during each release cycle did provide a goad to resume stalled development. So I'm at least a little concerned that if we make 3/year and don't change how we monitor feature progression at all, it will still slow things down because devs are being nagged less often. - -@ehashman - -> I am a little less worried about the "nag factor" now that we've moved to push-driven enhancements this release (SIG leads track, enhancements team accepts vs. enhancements team tracks). - -@jberkus - -> I'm very worried about it, see new thread. - -@johnbelamaric - -> I am more concerned about the "nag factor" due to the move to push-driven -> development; I think the lack of nagging will slow some features down. -> However, there are new things at play that will help with it - namely, the -> "no more permabeta" where things can't linger in beta forever because their -> API gets turned off automatically. At least for things with APIs. -> -> I strongly believe our alpha-to-stable latency, already quite high, will -> get worse with 3 releases per year. But ultimately, it's up to the feature -> authors. If they want it to go fast enough, they'll have to push for it -> more. Missing a train will have higher cost. -> -> Anyway, I personally am, as I said before, ambivalent on this decision. -> Putting on my GKE hat, it makes my life easier. Putting on my OSS hat, I -> have concerns but nothing that would make me strongly oppose it. It's not a -> change I would push for, but I think it's reasonable to see what happens if -> we try it. -> -> And thank you Josh for the analysis. The median 6 releases goes from 18 -> months to 24 months, which is not great but also something that is not -> forced on feature authors - they could push and get it done in less time if -> they need to. It's a rare feature that was making it in the 9 months -> before, so it would be a rare feature that is forced to have a longer cycle -> than they would have otherwise. - -@jberkus - -> I'm going to start a new thread on nag factors, because these are a bigger deal than I think folks want to address. -> -> There are two areas in the project, currently, that are almost entirely dependent on release team nagging (herafter RTN) for development: -> -> Getting features to GA (and to a lesser degree, to beta) -> Fixing test failures and flakes -> -> With the current 3-month cycle, this means that for 1 month of every cycle RTN doesn't happen, and as a result, these two activities don't happen. This is an extremely broken system. Contributors should not be dependent on nagging of any sort to take care of these project priorities, and the project as a whole shouldn't be depending on the RT for them except during Code Freeze. -> -> A 4-month cycle will make this problem worse, because we'll be looking a 2 months of every cycle where RTN won't happen, a doubling of the amount of time per year for tests to fail and alpha features to be forgotten. -> -> I am not saying that this is a reason NOT to do a 4-month cycle. I am saying that switching to a 4-month cycle makes fixing our broken RTN-based system an urgent priority. Fixing failing tests needs to happen year round. Reviewing features for promotion needs to happen at every SIG meeting. -> -> (FWIW, this is an issue that every large OSS project faces, which is why Code Freeze is to awful in so many projects) - -##### TODO @neolit123 - -> Of the 709 votes, 59.1% preferred three releases over our current, non-2020 target of four. -> -> i find these results surprising. we had the same question in the latest SIG CL survey and the most picked answer was "not sure". this tells me that users possibly do not understand all the implications or it does not matter to them. -> -> a couple of benefits that i see from the developer side: -> -> with the yearly support KEP we get 3 releases to maintain -> less e2e test jobs -> -> as long as SIG Arch are on board we should just proceed with the change. - -### Notes/Constraints/Caveats (Optional) - - +By applying a cadence of 3 releases per year, SIG Release members will gain a +reduced management overhead. There are also only 3 patch releases to maintain, +which right now can overlap up to 4. SIG Release will gain more time to ensure a +seamless transition from the previous release team to the next one. It is also +possible to include more shadows if the role leads conclude that this is +appropriate. ### Risks and Mitigations - - -#### TODO Concentrating risk - -@adrianotto: +#### Concentrating risk -> -1 -> -> I acknowledge this proposed change will not slow the rate of change, but it does concentrate risk. It means that each release would carry more change, and more risk. It also means that adoption of those features will be slower, and that's bad for users. -> -> Release early and release often. This philosophy is a key reason k8s matured as quickly as it did. I accept that 2020 is a strange year, and should be handled as such. That is not a valid reason to change what is done in subsequent years. Each time you make a change like this, it has a range of unintended consequences, such as the risk packing I mentioned above. It would be tragic to slow overall slowdown in the promotion of GA features because they transition based on releases, not duration in use. If the release process is burdensome, we should be asking how we can apply our creativity to make it easier, and reducing the release frequency might be one of several options. But asking the question this way constrains us from looking at the bigger picture, and fully considering what will serve the community best. +In theory a reduced release cadence will cause more changes for every release. +This means that there will be an increased risk, which would be usually split-up +into 4 dedicated milestones rather than 3. -@bowei: +SIG Release cannot mitigate this risk directly, but is able to track and +influence it during each release cycle. It's in the responsibility of SIG +Release, together with SIG Testing and SIG Architecture, to identify new gaps and +issues in the release cadence and mitigate them on a case-by-case basis. -> Echoing Adrian's comment: -> -> I think releases are a nice forcing function towards stabilization and having less releases will increase drift in the extra time. -> Are we coupling feature(s) stabilization to release cadence too much? -> One fear is that the work simply going to be pushed rather than decrease, but now there are fewer "stabilization" points in the year. +#### Attention to tests -@spiffxp: +This KEP does not propose any change to the release cycle itself and assumes +that the same periods for Code and Test Freeze. Assuming that, there is an +increased risk for flakes and test failures. It will be in the responsibility of +SIG Release to mitigate together with the CI signal role. If we speak about an +overall release cycle enhancement of 3-4 weeks, then we believe that SIG Release +is able to mitigate this risk over multiple releases. -> I'm a net -1 on 3 releases per year, but I understand I'm in the minority. Reducing the frequency of a risky/painful process does not naturally lead to a net reduction of pain or risk, and usually incentivizes increased risk. "Stabilize the patient" can be a good first step, but is insufficient on its own. -> -> To @tpepper's question of implementation, if we go with 3 symmetric releases, I would suggest using the "extra time" as a tech debt / process debt paydown phase at the beginning of each release cycle. Somewhat like how we left the milestone restriction in place at the beginning of the 1.20 release cycle. This would provide opportunity to pay down tech debt / process debt that involves large refactoring or breaking changes, the sort of work that is actively discouraged during the code freeze leading up to a release. -> -> I may have too narrow a view, but I have concerns that an April / August / December cadence puts undue pressure to land in August. I'm thinking of industries that typically go through a seasonal freeze in Q4. Shifting forward by a month (January / May / September) or two (February, June, October) may relieve some of that pressure, though it does cause one release to straddle Q4/Q1 in an awkward way. -> -> Another option is to declare Q4 what it has been in practice, a quieter time during which we're not actually going to push hard on a release, but I don't think that works as well with 3 releases vs. 4. +#### Attention to dependencies -@sebgoa: - -> But, generally speaking less releases (or less frequent minor releases) will also mean that each release will pack more weight, which means it will need even more testing and it will make upgrades tougher. -> -> With less releases developers will tend to rush their features at the last minute to "get it in" because the next one will be further apart. - -#### TODO Attention to tests - -@jberkus: - -> Extra month for flakes/failures to get worse if nobody looks at them until Code Freeze -> -> I think risk this can be reduced if we adjust (increase) the code freeze period. -> -> Better, how about we keep on top of flakes even if it's not Code Freeze? - -@alculquicondor: - -> Better, how about we keep on top of flakes even if it's not Code Freeze? -> -> Absolutely, but it's not easy to enforce without hurting development of non-violators. - -@jberkus: - -> Shutting down merges is the nuclear option for preventing flakes and fails. We should be able to keep on top of them without resorting to that. But ... we're getting off topic here, unless folks think the "increased time to flake" is a blocker for this (I don't). - -#### TODO Attention to dependencies - -@jberkus: - -> Extra problems with upstream dependency patch support if our timing is bad +Having less releases will introduce the risk of missing dependencies, for +example golang upgrades. This has to be mitigated on a case-by-case basis, in +the same way as it is being done right now. ## Design Details - - -### TODO Schedule - -- Symmetric vs asymmetric -- Explicit months -- Considerations around KubeCon -- Link notional schedules -- Golang releases -- November/December breaks -- Other dependencies to consider? - -@tpepper: - -> Can folks comment on how they'd prefer this to look operationally? -> -> There are operational benefits to consumers in having to do less upgrades (even with the small risk that a longer cadence means each upgrade is bigger), but that's also somewhat depending on when those are presented. Should we maybe rotate the release points backwards or forwards to get away from having a release at a particular time when it's less consumable? We also need to consider how to make this beneficial or at least not super disruptive to community muscle memory. -> -> symmetric four months active dev each, the ~3 months we'd gain for things like triage/roadmap planning, personal downtime, stability efforts spread across that. Are there specific benefits to picking any of: -> releases in April, August, December? -> releases in March, July, November? -> releases in February, June, October? -> releases in January, May, September? -> asymmetry with some explicit downtimes using the ~3 months we'd gain for things like triage/roadmap planning, personal downtime, stability efforts? Eg: late November, December, early January and also August have typically been slower times already. -> quiet December, dev activating more January through April release -> dev May through July release -> August stability scrubbing -> dev September through early December release -> some other explicit stability effort periods like taken in August 2020, formalized into the release cadence as a longer code freeze? -> -> Depending on how we lay out the annual calendar, we might also get some benefit (or complexity) relative to the golang release cycle and our need to update golang twice on average across the patch release lifecycle of each release. This may also compound relative to distributors release cadences, support lifecycles, their stabilization and lead time between content selection and release, and their balancing interoperability across a larger set of dependencies where those are tied to specific months on the annual calendar. - -@neolit123: - -> Golang has a "symmetric" model, so i think k8s should do the same. -> the "symmetric" choice however, would require more discipline and availability from contributors, so my vote here is to try "symmetric" and if it fails (maybe after one year) go "asymmetric". - -@mpbarrett: - -> Which months (ie April, August, Dec would be every 4 months from the beginning of a year) would be important for me as a user to know. - -@OmerKahani: - -> @tpepper for your question - the best month for us will be March, July, November-December. - -@khenidak: - -> @tpepper would be great if we add to this post typical kubecon(s) schedule, since most the community (those who are reviewing, approving, building, releasing changes) is also heavily engaged in these events. - -@jberkus: - -> My vote is for symmetric releases in April, August, and December. While it's tempting to make December an "off month", development does happen all the time, and if it's not on a release, what is it on? -> -> That would be a reason for my 2nd choice, which would be Symmetric March, July, November, which puts Slow December at the beginning of the cycle instead of the end. However, that's mainly a benefit for working around Kubecon November, and there's no good reason to believe that Kubecon will be happening in November 2 years from now; it might be September or October instead. - -#### TODO Impact to existing deadlines - -@cblecker: - -> We should also talk about how this may impact things like code freeze (longer feature freeze with only bug/scale/stability fixes?). - -### Test Plan - - - -### Graduation Criteria - - +2. The last Kubernetes release of a year should be finished by the middle of + December. -### Upgrade / Downgrade Strategy +3. A Kubernetes release cycle has a length of of ~15 weeks. - +4. Events like KubeCon will be considered as blocked from development or + decision making. SIG Release will also consider the week before and after the + event in the same way. -### Version Skew Strategy - - +5. Providing an explicit break of at least two weeks between each release cycle. + This does not mean that no development can happen during that time, but more + that SIG Release will use this time to do the release retrospective and plan + for the next cycle. ## Implementation History @@ -924,76 +363,10 @@ Already captured above, but you can find meeting notes [here](https://docs.googl ## Drawbacks - +The main drawbacks of this KEP have been covered in the [Risks and +Mitigations](#risks-and-mitigations) section. ## Alternatives - - -### LTS - -The LTS Working Group was [disbanded](https://github.com/kubernetes/community/pull/5240) -on October 20, 2020. - -The outcome of their conversations was the proposal which established a -[yearly support period][yearly-support-kep] for minor releases of the project. - -While we may revisit the idea in the future, for now, we trust the 2+ years of -thoughtful deliberation by the working group enough to conclude that the -project is not currently in a place to support long-term support releases. - -### Releasing Kubernetes faster - -The intent of this proposal is to create more opportunities to provide a -high-value experience for Kubernetes consumers. - -The implication is that we as a community have a reasonable amount of tech debt -across infrastructure, testing, policy, and documentation that does not suggest -it would be feasible to spend more time releasing when we could be paying down -that debt. - -SIG Release currently produces releases at the following cadence: - -- patch releases (`x.y.Z`): [monthly][patch-releases] -- minor releases (`x.Y.z`): [every four months][versioning] -- pre-releases (`x.y.0-(alpha|beta|rc).N`): every 1-3 weeks during active - development cycles ([example](https://git.k8s.io/sig-release/releases/release-1.21/README.md#timeline)) - -At the time of writing, SIG Release considers these to be reasonable cadences -for patch and pre-releases. - -If you'd like to provide feedback on longer-term improvements that maybe -accelerate production of releases, please join the discussion -[here](https://github.com/kubernetes/sig-release/discussions/1495) - -### End-of-year maintenance/stability releases - -Establishing a shorter maintenance/stability release at the end of the year has -been casually discussed at several points in the project, with the most recent -(at the time of writing) occurrence being -[here](https://github.com/kubernetes/sig-release/issues/809). - -Nothing compelling has emerged from previous conversations to give cause to -establish maintenance/stability releases. - -Fixing bugs, stabilizing components, adding/deflaking tests, improving -documentation, and graduating features are activities that can and should -happen in a reasonably consistent manner throughout the year. - -## Infrastructure Needed (Optional) - - - -[patch-releases]: https://git.k8s.io/sig-release/releases/patch-releases.md -[versioning]: https://git.k8s.io/sig-release/release-engineering/versioning.md -[yearly-support-kep]: /keps/sig-release/1498-kubernetes-yearly-support-period/README.md +The alternative approaches have been discussed in the [Non-goals](#non-goals) +section. From 92c1c2d892274d1fa6a3cb66f4dd98e6e4145aa5 Mon Sep 17 00:00:00 2001 From: Stephen Augustus Date: Wed, 24 Mar 2021 17:13:30 -0400 Subject: [PATCH 16/34] 2572-release-cadence: Add Lauri edits Signed-off-by: Stephen Augustus Co-authored-by: Lauri Apple --- .../2572-release-cadence/README.md | 69 +++++++++---------- 1 file changed, 33 insertions(+), 36 deletions(-) diff --git a/keps/sig-release/2572-release-cadence/README.md b/keps/sig-release/2572-release-cadence/README.md index 52a153b8196..1ba63c86cff 100644 --- a/keps/sig-release/2572-release-cadence/README.md +++ b/keps/sig-release/2572-release-cadence/README.md @@ -183,37 +183,34 @@ is not currently in a place to support long-term support releases. #### Changing enhancements graduation -The way that enhancements are being graduated will not change with this KEP. It's +This KEP will not change the way that enhancements are being graduated. It's the responsibility of SIGs to keep track of their enhancements and graduate them in the provided constraints of SIG Architecture. The new release schedule will add room for only a few more weeks of development. -This means that SIGs can focus on using that time to enhance documentation and -testing (stability) over adding more features. - -Those decisions are not part of any SIG Release planning and will be considered -therefore as out of scope. +SIGs should focus on using those additional weeks to enhance documentation and +testing (stability)—not on adding more features. These decisions are not part of +any SIG Release planning and will therefore be considered out of scope. #### Architecture changes -Changing any architecture of Kubernetes, for example decoupling its core -components from the k/k repository, is out of scope of this KEP. +Changing any architecture of Kubernetes—for example, decoupling its core +components from the k/k repository—is outside the scope of this KEP. #### Modifying SIG Architecture policies -This non-goal corresponds partially to the [Changing enhancements -graduation](#changing-enhancements-graduation) section and broadens its scope -that any policy change made by SIG Architecture is out of scope of this KEP. +Any policy change made by SIG Architecture is out of scope of this KEP. This +non-goal corresponds partially to the [Changing enhancements +graduation](#changing-enhancements-graduation) section. #### Accelerated release cycles The intent of this proposal is to create more opportunities to provide a high-value experience for Kubernetes consumers. -The implication is that we as a community have a reasonable amount of tech debt -across infrastructure, testing, policy, and documentation that does not suggest -it would be feasible to spend more time releasing when we could be paying down -that debt. +The Kubernetes community faces a reasonable amount of tech debt across infrastructure, +testing, policy, and documentation. This KEP proposes that we spend +more time paying down that debt. SIG Release currently produces releases at the following cadence: @@ -225,9 +222,9 @@ SIG Release currently produces releases at the following cadence: At the time of writing, SIG Release considers these to be reasonable cadences for patch and pre-releases. -If you'd like to provide feedback on longer-term improvements that maybe +If you'd like to provide suggestions on longer-term improvements that could potentially accelerate production of releases, please join the discussion -[here](https://github.com/kubernetes/sig-release/discussions/1495) +[here](https://github.com/kubernetes/sig-release/discussions/1495). #### Establishing maintenance/stability releases @@ -255,10 +252,10 @@ https://github.com/kubernetes/sig-release/issues/1494. ### User Stories -Kubernetes releases are made by real people. The technical aspects, for example -the release automation, reflects only a tiny part of the complete cycle. This +Kubernetes releases are made by real people. The technical aspects—for example, +the release automation—reflects only a tiny part of the complete cycle. This means we will mainly focus on the human aspects and their corresponding roles -when deciding to move to a 3 releases per year cadence. +when deciding to move to a 3-releases-per-year cadence. #### End User @@ -268,18 +265,18 @@ only 3 releases per year will relax this situation. #### Distributors and downstream projects Downstream projects assemble their solution from different projects. Having -fewer upgrades helps them to reduce the complexity. For example cloud providers +fewer upgrades helps them to reduce complexity. For example, cloud providers will gain more room for upgrading their infrastructure. #### Contributors -With a lower release cadence, contributors will gain more time for, project -enhancements, feature development, planning and testing. It will provide more +With a lower release cadence, contributors will gain more time for project +enhancements, feature development, planning, and testing. It will provide more room for maintaining their mental health and prepare for events like KubeCon. -Target of SIG Release with this proposal is to explicitly not push contributors -in doing more. It is about giving contributors more flexibility to decide how to -invest their time. +Through this proposal SIG Release's aim is to give contributors more flexibility +to decide how to invest their time. It is explicitly *not* to push contributors +in doing more. #### SIG Release members @@ -295,11 +292,11 @@ appropriate. #### Concentrating risk In theory a reduced release cadence will cause more changes for every release. -This means that there will be an increased risk, which would be usually split-up +This means that there will be an increased risk, which would usually be split up into 4 dedicated milestones rather than 3. SIG Release cannot mitigate this risk directly, but is able to track and -influence it during each release cycle. It's in the responsibility of SIG +influence it during each release cycle. It's the responsibility of SIG Release, together with SIG Testing and SIG Architecture, to identify new gaps and issues in the release cadence and mitigate them on a case-by-case basis. @@ -307,15 +304,15 @@ issues in the release cadence and mitigate them on a case-by-case basis. This KEP does not propose any change to the release cycle itself and assumes that the same periods for Code and Test Freeze. Assuming that, there is an -increased risk for flakes and test failures. It will be in the responsibility of -SIG Release to mitigate together with the CI signal role. If we speak about an +increased risk for flakes and test failures. It will be the responsibility of +SIG Release to mitigate this, together with the CI signal role. If we speak about an overall release cycle enhancement of 3-4 weeks, then we believe that SIG Release is able to mitigate this risk over multiple releases. #### Attention to dependencies -Having less releases will introduce the risk of missing dependencies, for -example golang upgrades. This has to be mitigated on a case-by-case basis, in +Having fewer releases will introduce the risk of missing dependencies—for +example, Golang upgrades. This has to be mitigated on a case-by-case basis, in the same way as it is being done right now. ## Design Details @@ -336,12 +333,12 @@ is defined as: 3. A Kubernetes release cycle has a length of of ~15 weeks. 4. Events like KubeCon will be considered as blocked from development or - decision making. SIG Release will also consider the week before and after the + decision-making. SIG Release will also consider the week before and after the event in the same way. -5. Providing an explicit break of at least two weeks between each release cycle. - This does not mean that no development can happen during that time, but more - that SIG Release will use this time to do the release retrospective and plan +5. An explicit break of at least two weeks between each release cycle will be enforced. + This does not mean that zero development can happen during that time. Rather, + SIG Release will use this time to do the release retrospective and plan for the next cycle. ## Implementation History From 854ffb5161c0ef0b9c328f794810af2b7c75a150 Mon Sep 17 00:00:00 2001 From: Stephen Augustus Date: Thu, 25 Mar 2021 16:16:15 -0400 Subject: [PATCH 17/34] 2572-release-cadence: Line-wrap at 80-char Signed-off-by: Stephen Augustus --- .../2572-release-cadence/README.md | 111 +++++++++--------- 1 file changed, 58 insertions(+), 53 deletions(-) diff --git a/keps/sig-release/2572-release-cadence/README.md b/keps/sig-release/2572-release-cadence/README.md index 1ba63c86cff..32449f74a84 100644 --- a/keps/sig-release/2572-release-cadence/README.md +++ b/keps/sig-release/2572-release-cadence/README.md @@ -124,35 +124,36 @@ cadence from 4 down to 3 releases per year. ## Motivation -Discussions around changing the release cadence for Kubernetes, which -currently releases 4 times per year, are ongoing in the community. +Discussions around changing the release cadence for Kubernetes, which currently +releases 4 times per year, are ongoing in the community. -The extended release schedule for 1.19 resulted in only three minor -Kubernetes releases for 2020. As a result, SIG Release received several -questions across a variety of platforms and communication channels about whether -the project intends to only have three minor releases/year. +The extended release schedule for 1.19 resulted in only three minor Kubernetes +releases for 2020. As a result, SIG Release received several questions across a +variety of platforms and communication channels about whether the project +intends to only have three minor releases/year. ### Goals #### Enhance determinism -With the current release cadence we already achieve a deterministic schedule for -every year. The goal of this KEP is to increase this even further by providing a -lightweight policy around creating the release schedule. Going down to 3 -releases provides additional room for triage, development, and explicit breaks, -which should result in better overall planning and more predictability. +With the current release cadence we already achieve a deterministic schedule +for every year. The goal of this KEP is to increase this even further by +providing a lightweight policy around creating the release schedule. Going +down to 3 releases provides additional room for triage, development, and +explicit breaks, which should result in better overall planning and more +predictability. #### Reduce risk -With higher predictability we can reduce the overall risk of changing the release -schedule. The planning overhead of SIG Release gets reduced, while users of -Kubernetes gain more time to adapt to the latest release. +With higher predictability we can reduce the overall risk of changing the +release schedule. The planning overhead of SIG Release gets reduced, while +users of Kubernetes gain more time to adapt to the latest release. -The current Kubernetes release cadence is so fast that most organizations cannot -keep up with regularly making a minor version update every 3 months or going out -of security support in a year. While releasing more frequently theoretically -reduces the churn and risk of each release, this is only true if end users are -actually able to apply the upgrades. +The current Kubernetes release cadence is so fast that most organizations +cannot keep up with regularly making a minor version update every 3 months or +going out of security support in a year. While releasing more frequently +theoretically reduces the churn and risk of each release, this is only true if +end users are actually able to apply the upgrades. #### Collecting data @@ -161,7 +162,8 @@ feedback about the new release cadence. #### Creating a policy -The outcome of this KEP is a policy for creating release schedules for Kubernetes. +The outcome of this KEP is a policy for creating release schedules for +Kubernetes. This allows the release team, as well as users, to follow a set of simple rules when it comes to knowing when and how Kubernetes releases will be scheduled. @@ -178,19 +180,20 @@ The outcome of their conversations was the proposal which established a for minor releases of the project. While we may revisit the idea in the future, for now we trust the 2+ years of -thoughtful deliberation by the working group enough to conclude that the project -is not currently in a place to support long-term support releases. +thoughtful deliberation by the working group enough to conclude that the +project is not currently in a place to support long-term support releases. #### Changing enhancements graduation This KEP will not change the way that enhancements are being graduated. It's -the responsibility of SIGs to keep track of their enhancements and -graduate them in the provided constraints of SIG Architecture. +the responsibility of SIGs to keep track of their enhancements and graduate +them in the provided constraints of SIG Architecture. -The new release schedule will add room for only a few more weeks of development. +The new release schedule will add room for only a few more weeks of +development. SIGs should focus on using those additional weeks to enhance documentation and -testing (stability)—not on adding more features. These decisions are not part of -any SIG Release planning and will therefore be considered out of scope. +testing (stability)—not on adding more features. These decisions are not part +of any SIG Release planning and will therefore be considered out of scope. #### Architecture changes @@ -199,7 +202,7 @@ components from the k/k repository—is outside the scope of this KEP. #### Modifying SIG Architecture policies -Any policy change made by SIG Architecture is out of scope of this KEP. This +Any policy change made by SIG Architecture is out of scope of this KEP. This non-goal corresponds partially to the [Changing enhancements graduation](#changing-enhancements-graduation) section. @@ -208,9 +211,9 @@ graduation](#changing-enhancements-graduation) section. The intent of this proposal is to create more opportunities to provide a high-value experience for Kubernetes consumers. -The Kubernetes community faces a reasonable amount of tech debt across infrastructure, -testing, policy, and documentation. This KEP proposes that we spend -more time paying down that debt. +The Kubernetes community faces a reasonable amount of tech debt across +infrastructure, testing, policy, and documentation. This KEP proposes that we +spend more time paying down that debt. SIG Release currently produces releases at the following cadence: @@ -222,8 +225,8 @@ SIG Release currently produces releases at the following cadence: At the time of writing, SIG Release considers these to be reasonable cadences for patch and pre-releases. -If you'd like to provide suggestions on longer-term improvements that could potentially -accelerate production of releases, please join the discussion +If you'd like to provide suggestions on longer-term improvements that could +potentially accelerate production of releases, please join the discussion [here](https://github.com/kubernetes/sig-release/discussions/1495). #### Establishing maintenance/stability releases @@ -274,17 +277,17 @@ With a lower release cadence, contributors will gain more time for project enhancements, feature development, planning, and testing. It will provide more room for maintaining their mental health and prepare for events like KubeCon. -Through this proposal SIG Release's aim is to give contributors more flexibility -to decide how to invest their time. It is explicitly *not* to push contributors -in doing more. +Through this proposal SIG Release's aim is to give contributors more +flexibility to decide how to invest their time. It is explicitly *not* to push +contributors in doing more. #### SIG Release members By applying a cadence of 3 releases per year, SIG Release members will gain a reduced management overhead. There are also only 3 patch releases to maintain, -which right now can overlap up to 4. SIG Release will gain more time to ensure a -seamless transition from the previous release team to the next one. It is also -possible to include more shadows if the role leads conclude that this is +which right now can overlap up to 4. SIG Release will gain more time to ensure +a seamless transition from the previous release team to the next one. It is +also possible to include more shadows if the role leads conclude that this is appropriate. ### Risks and Mitigations @@ -292,22 +295,22 @@ appropriate. #### Concentrating risk In theory a reduced release cadence will cause more changes for every release. -This means that there will be an increased risk, which would usually be split up -into 4 dedicated milestones rather than 3. +This means that there will be an increased risk, which would usually be split +up into 4 dedicated milestones rather than 3. SIG Release cannot mitigate this risk directly, but is able to track and -influence it during each release cycle. It's the responsibility of SIG -Release, together with SIG Testing and SIG Architecture, to identify new gaps and -issues in the release cadence and mitigate them on a case-by-case basis. +influence it during each release cycle. It's the responsibility of SIG Release, +together with SIG Testing and SIG Architecture, to identify new gaps and issues +in the release cadence and mitigate them on a case-by-case basis. #### Attention to tests This KEP does not propose any change to the release cycle itself and assumes that the same periods for Code and Test Freeze. Assuming that, there is an increased risk for flakes and test failures. It will be the responsibility of -SIG Release to mitigate this, together with the CI signal role. If we speak about an -overall release cycle enhancement of 3-4 weeks, then we believe that SIG Release -is able to mitigate this risk over multiple releases. +SIG Release to mitigate this, together with the CI signal role. If we speak +about an overall release cycle enhancement of 3-4 weeks, then we believe that +SIG Release is able to mitigate this risk over multiple releases. #### Attention to dependencies @@ -333,13 +336,15 @@ is defined as: 3. A Kubernetes release cycle has a length of of ~15 weeks. 4. Events like KubeCon will be considered as blocked from development or - decision-making. SIG Release will also consider the week before and after the - event in the same way. + decision-making. SIG Release will also consider the week before and after + the event in the same way. -5. An explicit break of at least two weeks between each release cycle will be enforced. - This does not mean that zero development can happen during that time. Rather, - SIG Release will use this time to do the release retrospective and plan - for the next cycle. +5. An explicit break of at least two weeks between each release cycle will be + enforced. + + This does not mean that zero development can happen during that time. + Rather, SIG Release will use this time to do the release retrospective and + plan for the next cycle. ## Implementation History From 0502c943f29b6a3298249b2c64cceeeed42f61e4 Mon Sep 17 00:00:00 2001 From: Stephen Augustus Date: Thu, 25 Mar 2021 16:19:50 -0400 Subject: [PATCH 18/34] 2572-release-cadence: Mark as implementable w/ fake PRR Signed-off-by: Stephen Augustus --- keps/prod-readiness/sig-release/2572.yaml | 3 +++ keps/sig-release/2572-release-cadence/kep.yaml | 2 +- 2 files changed, 4 insertions(+), 1 deletion(-) create mode 100644 keps/prod-readiness/sig-release/2572.yaml diff --git a/keps/prod-readiness/sig-release/2572.yaml b/keps/prod-readiness/sig-release/2572.yaml new file mode 100644 index 00000000000..03f7109f180 --- /dev/null +++ b/keps/prod-readiness/sig-release/2572.yaml @@ -0,0 +1,3 @@ +kep-number: 2572 +alpha: + approver: "@johnbelamaric" diff --git a/keps/sig-release/2572-release-cadence/kep.yaml b/keps/sig-release/2572-release-cadence/kep.yaml index 8780c3e1a2c..8e7b3eea12e 100644 --- a/keps/sig-release/2572-release-cadence/kep.yaml +++ b/keps/sig-release/2572-release-cadence/kep.yaml @@ -6,7 +6,7 @@ owning-sig: sig-release participating-sigs: - sig-architecture - sig-testing -status: provisional +status: implementable creation-date: 2021-01-21 reviewers: - "@BenTheElder" From 03fe45e9c8fc8f9c39cac53c3ff0b7979188cbd6 Mon Sep 17 00:00:00 2001 From: Stephen Augustus Date: Fri, 26 Mar 2021 16:23:34 -0400 Subject: [PATCH 19/34] 2572-release-cadence: Add initial details about feedback survey Signed-off-by: Stephen Augustus --- .../2572-release-cadence/README.md | 22 +++++++++++++++++++ 1 file changed, 22 insertions(+) diff --git a/keps/sig-release/2572-release-cadence/README.md b/keps/sig-release/2572-release-cadence/README.md index 32449f74a84..b19969ca0fd 100644 --- a/keps/sig-release/2572-release-cadence/README.md +++ b/keps/sig-release/2572-release-cadence/README.md @@ -73,6 +73,7 @@ SIG Architecture for cross-cutting KEPs). - [Attention to dependencies](#attention-to-dependencies) - [Design Details](#design-details) - [Schedule Policy](#schedule-policy) + - [Feedback survey](#feedback-survey) - [Implementation History](#implementation-history) - [Leads meeting feedback session](#leads-meeting-feedback-session) - [Drawbacks](#drawbacks) @@ -346,6 +347,27 @@ is defined as: Rather, SIG Release will use this time to do the release retrospective and plan for the next cycle. +### Feedback survey + +Each minor Kubernetes release will be an experience survey, which will include +questions around the release cadence. + +Survey contents are to be determined, but we welcome content suggestions to +continually improve the process. + +Post-release surveys will close after the `.2` patch release to allow the team +sufficient time to process and incorporate feedback. + +Using Kubernetes v1.19 date to provide an example of the survey timeline: + +- 2020-08-26: v1.19.0 released (survey would go out) +- 2020-09-09: v1.19.1 released +- 2020-09-16: v1.19.2 released (survey would close) + +With this example, the survey would have been open for three weeks. +With an extended release cycle, post-release surveys would be open for around +three to six weeks (depending on the patch release schedule). + ## Implementation History - # KEP-2572: Defining the Kubernetes Release Cadence @@ -82,22 +38,6 @@ SIG Architecture for cross-cutting KEPs). ## Release Signoff Checklist - - -Items marked with (R) are required *prior to targeting to a milestone / release*. - - [ ] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR) - [ ] (R) KEP approvers have approved the KEP status as `implementable` - [ ] (R) Design details are appropriately documented @@ -109,10 +49,6 @@ Items marked with (R) are required *prior to targeting to a milestone / release* - [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io] - [ ] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes - - [kubernetes.io]: https://kubernetes.io/ [kubernetes/enhancements]: https://git.k8s.io/enhancements [kubernetes/kubernetes]: https://git.k8s.io/kubernetes @@ -370,17 +306,6 @@ three to six weeks (depending on the patch release schedule). ## Implementation History - - ### Leads meeting feedback session Already captured above, but you can find meeting notes [here](https://docs.google.com/document/d/1Jio9rEtYxlBbntF8mRGmj6Q1JAdzZ9fTDo3ru1HK_LI/edit#bookmark=id.val5alfdahlr). From d4f878ad5e5c378138e75c2ec51ca98f1a0424df Mon Sep 17 00:00:00 2001 From: Sascha Grunert Date: Wed, 7 Apr 2021 11:25:45 +0200 Subject: [PATCH 21/34] Apply review comments and fixups Signed-off-by: Sascha Grunert --- .../2572-release-cadence/README.md | 56 ++++++++----------- 1 file changed, 23 insertions(+), 33 deletions(-) diff --git a/keps/sig-release/2572-release-cadence/README.md b/keps/sig-release/2572-release-cadence/README.md index f2dbe38c7f7..b7e237dec48 100644 --- a/keps/sig-release/2572-release-cadence/README.md +++ b/keps/sig-release/2572-release-cadence/README.md @@ -1,6 +1,7 @@ # KEP-2572: Defining the Kubernetes Release Cadence + - [Release Signoff Checklist](#release-signoff-checklist) - [Summary](#summary) - [Motivation](#motivation) @@ -76,9 +77,9 @@ intends to only have three minor releases/year. With the current release cadence we already achieve a deterministic schedule for every year. The goal of this KEP is to increase this even further by providing a lightweight policy around creating the release schedule. Going -down to 3 releases provides additional room for triage, development, and -explicit breaks, which should result in better overall planning and more -predictability. +down to 3 releases provides additional room for triage, development, conference +and release cycle preparations, which should result in better overall planning +and more predictability. #### Reduce risk @@ -94,8 +95,9 @@ end users are actually able to apply the upgrades. #### Collecting data -After this KEP is in place, SIG Release will follow up with a survey to collect -feedback about the new release cadence. +After this KEP is in place and the first three minor (`1.x.0`) versions have +been released, SIG Release will follow up with a survey to collect feedback +about the new release cadence. #### Creating a policy @@ -109,11 +111,10 @@ when it comes to knowing when and how Kubernetes releases will be scheduled. #### Long-term support (LTS) releases The LTS Working Group was -[disbanded](https://github.com/kubernetes/community/pull/5240) on October 20, -2020. +[disbanded](https://github.com/kubernetes/community/pull/5240) on October 20, 2020. The outcome of their conversations was the proposal which established a -[yearly support period][/keps/sig-release/1498-kubernetes-yearly-support-period/README.md] +[yearly support period](/keps/sig-release/1498-kubernetes-yearly-support-period/readme.md) for minor releases of the project. While we may revisit the idea in the future, for now we trust the 2+ years of @@ -129,7 +130,7 @@ them in the provided constraints of SIG Architecture. The new release schedule will add room for only a few more weeks of development. SIGs should focus on using those additional weeks to enhance documentation and -testing (stability)—not on adding more features. These decisions are not part +testing (stability) - not on adding more features. These decisions are not part of any SIG Release planning and will therefore be considered out of scope. #### Architecture changes @@ -199,8 +200,8 @@ when deciding to move to a 3-releases-per-year cadence. #### End User -Most companies are facing issues upgrading Kubernetes 4 times a year. Providing -only 3 releases per year will relax this situation. +Most end user organizations find it difficult to match Kubernetes release +cadence - only 3 releases per year will relax this situation. #### Distributors and downstream projects @@ -212,10 +213,11 @@ will gain more room for upgrading their infrastructure. With a lower release cadence, contributors will gain more time for project enhancements, feature development, planning, and testing. It will provide more -room for maintaining their mental health and prepare for events like KubeCon. +room for maintaining their mental health, prepare for events like KubeCon or +work on the downstream integration. Through this proposal SIG Release's aim is to give contributors more -flexibility to decide how to invest their time. It is explicitly *not* to push +flexibility to decide how to invest their time. It is explicitly _not_ to push contributors in doing more. #### SIG Release members @@ -251,7 +253,7 @@ SIG Release is able to mitigate this risk over multiple releases. #### Attention to dependencies -Having fewer releases will introduce the risk of missing dependencies—for +Having fewer releases will introduce the risk of missing dependencies — for example, Golang upgrades. This has to be mitigated on a case-by-case basis, in the same way as it is being done right now. @@ -273,11 +275,11 @@ is defined as: 3. A Kubernetes release cycle has a length of of ~15 weeks. 4. Events like KubeCon will be considered as blocked from development or - decision-making. SIG Release will also consider the week before and after - the event in the same way. + decision-making from the SIG release perspective. SIG Release will also + consider the week before and after the event in the same way. -5. An explicit break of at least two weeks between each release cycle will be - enforced. +5. An explicit SIG release break of at least two weeks between each cycle will + be enforced. This does not mean that zero development can happen during that time. Rather, SIG Release will use this time to do the release retrospective and @@ -285,25 +287,13 @@ is defined as: ### Feedback survey -Each minor Kubernetes release will be an experience survey, which will include -questions around the release cadence. +SIG Release will draft an experience survey after the first three releases from +which the new cadence has been applied. This survey which will include questions +around the release cadence and how it impacted end users. Survey contents are to be determined, but we welcome content suggestions to continually improve the process. -Post-release surveys will close after the `.2` patch release to allow the team -sufficient time to process and incorporate feedback. - -Using Kubernetes v1.19 date to provide an example of the survey timeline: - -- 2020-08-26: v1.19.0 released (survey would go out) -- 2020-09-09: v1.19.1 released -- 2020-09-16: v1.19.2 released (survey would close) - -With this example, the survey would have been open for three weeks. -With an extended release cycle, post-release surveys would be open for around -three to six weeks (depending on the patch release schedule). - ## Implementation History ### Leads meeting feedback session From 995ab88fd967e7e9157d681ab881b2d950f9de97 Mon Sep 17 00:00:00 2001 From: Sascha Grunert Date: Thu, 8 Apr 2021 09:52:44 +0200 Subject: [PATCH 22/34] Add note about feature graduation Signed-off-by: Sascha Grunert --- keps/sig-release/2572-release-cadence/README.md | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/keps/sig-release/2572-release-cadence/README.md b/keps/sig-release/2572-release-cadence/README.md index b7e237dec48..4103c9e4349 100644 --- a/keps/sig-release/2572-release-cadence/README.md +++ b/keps/sig-release/2572-release-cadence/README.md @@ -1,7 +1,6 @@ # KEP-2572: Defining the Kubernetes Release Cadence - - [Release Signoff Checklist](#release-signoff-checklist) - [Summary](#summary) - [Motivation](#motivation) @@ -28,6 +27,7 @@ - [Concentrating risk](#concentrating-risk) - [Attention to tests](#attention-to-tests) - [Attention to dependencies](#attention-to-dependencies) + - [Feature graduation](#feature-graduation) - [Design Details](#design-details) - [Schedule Policy](#schedule-policy) - [Feedback survey](#feedback-survey) @@ -257,6 +257,15 @@ Having fewer releases will introduce the risk of missing dependencies — for example, Golang upgrades. This has to be mitigated on a case-by-case basis, in the same way as it is being done right now. +#### Feature graduation + +Research discovered that only 5% of Kubernetes features advanced from Alpha to +GA in the minimum 3 releases. However, the same research showed that reminders +from the Release Team played a critical role in advancement of more than 50% of +features. With an increased release cycle, this reminder activity can be +expected to slow down. As such, advancement will need to be mitigated by making +sure that SIGs keep track of their feature enhancement in more detail. + ## Design Details ### Schedule Policy From 90bc266feca7fd05a7a89f55d359552d556e7ea3 Mon Sep 17 00:00:00 2001 From: Jeremy Date: Thu, 8 Apr 2021 19:26:37 -0600 Subject: [PATCH 23/34] Apply additional suggestions and review feedback Signed-off-by: Jeremy --- keps/sig-release/2572-release-cadence/README.md | 11 ++++++++--- keps/sig-release/2572-release-cadence/kep.yaml | 1 + 2 files changed, 9 insertions(+), 3 deletions(-) diff --git a/keps/sig-release/2572-release-cadence/README.md b/keps/sig-release/2572-release-cadence/README.md index 4103c9e4349..c730f9d0251 100644 --- a/keps/sig-release/2572-release-cadence/README.md +++ b/keps/sig-release/2572-release-cadence/README.md @@ -58,7 +58,8 @@ ## Summary With this KEP, SIG Release proposes to change the current Kubernetes release -cadence from 4 down to 3 releases per year. +cadence from 4 down to 3 releases per year. This would start with the +Kubernetes 1.22 release cycle. ## Motivation @@ -155,8 +156,8 @@ spend more time paying down that debt. SIG Release currently produces releases at the following cadence: -- patch releases (`x.y.Z`): [monthly][https://git.k8s.io/sig-release/releases/patch-releases.md] -- minor releases (`x.Y.z`): [every four months][https://git.k8s.io/sig-release/release-engineering/versioning.md] +- patch releases (`x.y.Z`): [monthly](https://git.k8s.io/sig-release/releases/patch-releases.md) +- minor releases (`x.Y.z`): [every four months](https://git.k8s.io/sig-release/release-engineering/versioning.md) - pre-releases (`x.y.0-(alpha|beta|rc).N`): every 1-3 weeks during active development cycles ([example](https://git.k8s.io/sig-release/releases/release-1.21/README.md#timeline)) @@ -305,6 +306,10 @@ continually improve the process. ## Implementation History +### GitHub Discussion + +Prior to opening this KEP, a [Github Discussion](https://github.com/kubernetes/sig-release/discussions/1290) was opened to solicit community feedback, which was used as the basis for this KEP. + ### Leads meeting feedback session Already captured above, but you can find meeting notes [here](https://docs.google.com/document/d/1Jio9rEtYxlBbntF8mRGmj6Q1JAdzZ9fTDo3ru1HK_LI/edit#bookmark=id.val5alfdahlr). diff --git a/keps/sig-release/2572-release-cadence/kep.yaml b/keps/sig-release/2572-release-cadence/kep.yaml index 8e7b3eea12e..1650b08edc0 100644 --- a/keps/sig-release/2572-release-cadence/kep.yaml +++ b/keps/sig-release/2572-release-cadence/kep.yaml @@ -12,6 +12,7 @@ reviewers: - "@BenTheElder" - "@derekwaynecarr" - "@dims" + - "@ehashman" - "@hasheddan" - "@jeremyrickard" - "@johnbelamaric" From 3fc92e1f920f6d4099a0b4f1b08dcd8d1266dd06 Mon Sep 17 00:00:00 2001 From: Jeremy Date: Thu, 8 Apr 2021 20:15:50 -0600 Subject: [PATCH 24/34] Update TOC Signed-off-by: Jeremy --- keps/sig-release/2572-release-cadence/README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/keps/sig-release/2572-release-cadence/README.md b/keps/sig-release/2572-release-cadence/README.md index c730f9d0251..da16aee4c6c 100644 --- a/keps/sig-release/2572-release-cadence/README.md +++ b/keps/sig-release/2572-release-cadence/README.md @@ -32,6 +32,7 @@ - [Schedule Policy](#schedule-policy) - [Feedback survey](#feedback-survey) - [Implementation History](#implementation-history) + - [GitHub Discussion](#github-discussion) - [Leads meeting feedback session](#leads-meeting-feedback-session) - [Drawbacks](#drawbacks) - [Alternatives](#alternatives) From 0183839503c68bf49c37c81fcf3002767492341f Mon Sep 17 00:00:00 2001 From: Jeremy Date: Fri, 9 Apr 2021 16:41:22 -0600 Subject: [PATCH 25/34] Add tabular representation of before and after proposed change. Signed-off-by: Jeremy --- .../2572-release-cadence/README.md | 63 +++++++++++++++++++ 1 file changed, 63 insertions(+) diff --git a/keps/sig-release/2572-release-cadence/README.md b/keps/sig-release/2572-release-cadence/README.md index da16aee4c6c..e32b3ece61f 100644 --- a/keps/sig-release/2572-release-cadence/README.md +++ b/keps/sig-release/2572-release-cadence/README.md @@ -193,6 +193,69 @@ https://github.com/kubernetes/sig-release/issues/1494. ## Proposal +The following tables detail a notional timeline for the remainder of 2021 and +for 2020, leveraging the historical *4-releases-per-year cadence*. Generally, +code freeze remains in effect until the last week of the release, so +development for the next release generally starts prior to the official release +team kickoff. A minimum of 1 week is needed between releases to fully form the +release team and to facilitate on-boarding of shadows. The fourth release of +the year has traditionally been compressed and limited in scope, overlapping +with of end of year holidays and vacation for many contributors. Additionally, +KubeCon normally occurs during at least one release, eliminating a week of +working time. + +*Kubernetes Release Sechedule 2021 (Existing 4 Release Cadence)* + +| Year Week Number | Release Number | Release Week | Note | +| -------- | -------- | -------- | -------- | +| 2 | 1 | 1 (January 11) | | +| 14 | 1 | 13 (April 8) | | +| 16 | 2 | 1 (April 19) | | +| 27 | 2 | 11 (July 06) | One week break for KubeCon EU - 10 weeks of working | +| 29 | 3 | 1 (July 20) | | +| 40 | 3 | 11 (October 5) | | +| 42 | 4 | 1 (October 18) | | +| 52 | 4 | 10 (December 28) | End of Year Holidays | + +*Kubernetes Release Sechedule 2022 (Existing 4 Release Cadence)* + +| Year Week Number | Release Number | Release Week | Note | +| -------- | -------- | -------- | -------- | +| 1 | 1 | 1 (January 3) | | +| 12 | 1 | 12 (March 15) | | +| 14 | 2 | 1 (March 28) | Probable KubeCon EU | +| 26 | 2 | 12 (June 28) | | +| 28 | 3 | 1 (July 11) | | +| 40 | 3 | 12 (October 4) | | +| 42 | 4 | 1 (October 17) | Probably KubeCon NA | +| 52 | 4 | 10 (Dec 28) | | + +This KEP proposes a transition to a *3-releases-per-year cadence*, beginning +with the Kubernetes 1.22 Release. This would result in a *15* week release +cycle, with *2* weeks between release cycles. + +*Kubernetes Release Sechedule 2021 (Proposed 3 Release Cadence)* + +| Year Week Number | Release Number | Release Week | Note | +| -------- | -------- | -------- | -------- | +| 2 | 1 | 1 (January 11) | | +| 14 | 1 | 13 (April 8) | | +| 17 | 2 | 1 (April 26) | | +| 32 | 2 | 15 (August 10) | KubeCon EU - 14 weeks of actual work| +| 35 | 3 | 1 (August 31) | | +| 50 | 3 | 15 (December 14) | Kubecon NA - 14 weeks of actual work | + +*Kubernetes Release Sechedule 2022 (Proposed 3 Release Cadence)* + +| Year Week Number | Release Number | Release Week | Note | +| -------- | -------- | -------- | -------- | +| 1 | 1 | 1 (January 3) | | +| 15 | 1 | 15 (April 12) | | +| 18 | 2 | 1 (May 2) | Probably KubeCon EU | +| 33 | 2 | 15 (August 15) | | +| 36 | 3 | 1 (September 6 | Probably KubeCon NA | +| 51 | 3 | 15 (December 20) | + ### User Stories Kubernetes releases are made by real people. The technical aspects—for example, From 9d65c63aee8a040cb6fad81d79a6d5488d16a29b Mon Sep 17 00:00:00 2001 From: Sascha Grunert Date: Mon, 12 Apr 2021 10:29:12 +0200 Subject: [PATCH 26/34] Apply suggestions from code review Co-authored-by: Kirsten Co-authored-by: Josh Berkus --- keps/sig-release/2572-release-cadence/README.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/keps/sig-release/2572-release-cadence/README.md b/keps/sig-release/2572-release-cadence/README.md index e32b3ece61f..334dc38cddc 100644 --- a/keps/sig-release/2572-release-cadence/README.md +++ b/keps/sig-release/2572-release-cadence/README.md @@ -194,13 +194,13 @@ https://github.com/kubernetes/sig-release/issues/1494. ## Proposal The following tables detail a notional timeline for the remainder of 2021 and -for 2020, leveraging the historical *4-releases-per-year cadence*. Generally, +for 2022, leveraging the historical *4-releases-per-year cadence*. Generally, code freeze remains in effect until the last week of the release, so development for the next release generally starts prior to the official release team kickoff. A minimum of 1 week is needed between releases to fully form the release team and to facilitate on-boarding of shadows. The fourth release of the year has traditionally been compressed and limited in scope, overlapping -with of end of year holidays and vacation for many contributors. Additionally, +with end of year holidays and vacation for many contributors. Additionally, KubeCon normally occurs during at least one release, eliminating a week of working time. @@ -234,7 +234,7 @@ This KEP proposes a transition to a *3-releases-per-year cadence*, beginning with the Kubernetes 1.22 Release. This would result in a *15* week release cycle, with *2* weeks between release cycles. -*Kubernetes Release Sechedule 2021 (Proposed 3 Release Cadence)* +*Kubernetes Release Schedule 2021 (Proposed 3 Release Cadence)* | Year Week Number | Release Number | Release Week | Note | | -------- | -------- | -------- | -------- | @@ -327,7 +327,7 @@ the same way as it is being done right now. Research discovered that only 5% of Kubernetes features advanced from Alpha to GA in the minimum 3 releases. However, the same research showed that reminders from the Release Team played a critical role in advancement of more than 50% of -features. With an increased release cycle, this reminder activity can be +features. With a longer release cycle, this reminder activity can be expected to slow down. As such, advancement will need to be mitigated by making sure that SIGs keep track of their feature enhancement in more detail. From 7e1b190911e206c09c2ec81183f3887f0abd6dc3 Mon Sep 17 00:00:00 2001 From: Jeremy Date: Wed, 14 Apr 2021 08:39:36 -0600 Subject: [PATCH 27/34] Move sig-testing and sig-arch chairs to approvers Signed-off-by: Jeremy --- keps/sig-release/2572-release-cadence/kep.yaml | 17 +++++++++-------- 1 file changed, 9 insertions(+), 8 deletions(-) diff --git a/keps/sig-release/2572-release-cadence/kep.yaml b/keps/sig-release/2572-release-cadence/kep.yaml index 1650b08edc0..f94a1f453ab 100644 --- a/keps/sig-release/2572-release-cadence/kep.yaml +++ b/keps/sig-release/2572-release-cadence/kep.yaml @@ -9,18 +9,19 @@ participating-sigs: status: implementable creation-date: 2021-01-21 reviewers: - - "@BenTheElder" - - "@derekwaynecarr" - - "@dims" - "@ehashman" - "@hasheddan" - "@jeremyrickard" - - "@johnbelamaric" - - "@spiffxp" - - "@stevekuznetsov" + approvers: - - "@LappleApple" - - "@saschagrunert" + - "@BenTheElder" #sig-testing + - "@derekwaynecarr" #sig-architecture + - "@dims" #sig-architecture + - "@johnbelamaric" #sig-architecture + - "@LappleApple" #sig-release + - "@saschagrunert" #sig-release + - "@spiffxp" #sig-testing + - "@stevekuznetsov" #sig-testing # The target maturity stage in the current dev cycle for this KEP. stage: alpha From 3a1e6c41f21af243c3f48dcc0ce22a9faee83dd6 Mon Sep 17 00:00:00 2001 From: Jeremy Date: Wed, 14 Apr 2021 15:13:16 -0600 Subject: [PATCH 28/34] Update with review feedback and clarify alpha/beta/stable and official start Signed-off-by: Jeremy --- .../2572-release-cadence/README.md | 77 ++++++++++++++----- .../sig-release/2572-release-cadence/kep.yaml | 2 +- 2 files changed, 58 insertions(+), 21 deletions(-) diff --git a/keps/sig-release/2572-release-cadence/README.md b/keps/sig-release/2572-release-cadence/README.md index 334dc38cddc..dded92b0459 100644 --- a/keps/sig-release/2572-release-cadence/README.md +++ b/keps/sig-release/2572-release-cadence/README.md @@ -28,6 +28,7 @@ - [Attention to tests](#attention-to-tests) - [Attention to dependencies](#attention-to-dependencies) - [Feature graduation](#feature-graduation) + - [Unprepared Kubernetes Users](#unprepared-kubernetes-users) - [Design Details](#design-details) - [Schedule Policy](#schedule-policy) - [Feedback survey](#feedback-survey) @@ -60,7 +61,7 @@ With this KEP, SIG Release proposes to change the current Kubernetes release cadence from 4 down to 3 releases per year. This would start with the -Kubernetes 1.22 release cycle. +Kubernetes 1.23 release cycle. ## Motivation @@ -103,9 +104,8 @@ about the new release cadence. #### Creating a policy -The outcome of this KEP is a policy for creating release schedules for -Kubernetes. -This allows the release team, as well as users, to follow a set of simple rules +The outcome of this KEP is written, lightweight policy for creating release schedules for +Kubernetes. This allows the release team, as well as users, to follow a set of simple rules when it comes to knowing when and how Kubernetes releases will be scheduled. ### Non-Goals @@ -193,6 +193,12 @@ https://github.com/kubernetes/sig-release/issues/1494. ## Proposal +This KEP proposes a transition to a *3-releases-per-year cadence*, beginning +with the Kubernetes 1.23 Release. This would result in a *15* week release +cycle, with *2* weeks between release cycles. During the Kubernetes 1.22 release, +a focused communication effort will be undertaken to communicate to contributors and +the end user community. + The following tables detail a notional timeline for the remainder of 2021 and for 2022, leveraging the historical *4-releases-per-year cadence*. Generally, code freeze remains in effect until the last week of the release, so @@ -201,7 +207,7 @@ team kickoff. A minimum of 1 week is needed between releases to fully form the release team and to facilitate on-boarding of shadows. The fourth release of the year has traditionally been compressed and limited in scope, overlapping with end of year holidays and vacation for many contributors. Additionally, -KubeCon normally occurs during at least one release, eliminating a week of +KubeCon normally occurs during at least one release, eliminating at *least* one week of working time. *Kubernetes Release Sechedule 2021 (Existing 4 Release Cadence)* @@ -211,7 +217,7 @@ working time. | 2 | 1 | 1 (January 11) | | | 14 | 1 | 13 (April 8) | | | 16 | 2 | 1 (April 19) | | -| 27 | 2 | 11 (July 06) | One week break for KubeCon EU - 10 weeks of working | +| 27 | 2 | 11 (July 06) | One week break for KubeCon EU | | 29 | 3 | 1 (July 20) | | | 40 | 3 | 11 (October 5) | | | 42 | 4 | 1 (October 18) | | @@ -223,16 +229,15 @@ working time. | -------- | -------- | -------- | -------- | | 1 | 1 | 1 (January 3) | | | 12 | 1 | 12 (March 15) | | -| 14 | 2 | 1 (March 28) | Probable KubeCon EU | +| 14 | 2 | 1 (March 28) | Probable KubeCon EU Break| | 26 | 2 | 12 (June 28) | | | 28 | 3 | 1 (July 11) | | | 40 | 3 | 12 (October 4) | | -| 42 | 4 | 1 (October 17) | Probably KubeCon NA | +| 42 | 4 | 1 (October 17) | Probably KubeCon NA Break | | 52 | 4 | 10 (Dec 28) | | -This KEP proposes a transition to a *3-releases-per-year cadence*, beginning -with the Kubernetes 1.22 Release. This would result in a *15* week release -cycle, with *2* weeks between release cycles. +With the proposed change in cadence, the notional schedules for the remainder of +2021 and 2022 are shown below: *Kubernetes Release Schedule 2021 (Proposed 3 Release Cadence)* @@ -241,9 +246,9 @@ cycle, with *2* weeks between release cycles. | 2 | 1 | 1 (January 11) | | | 14 | 1 | 13 (April 8) | | | 17 | 2 | 1 (April 26) | | -| 32 | 2 | 15 (August 10) | KubeCon EU - 14 weeks of actual work| -| 35 | 3 | 1 (August 31) | | -| 50 | 3 | 15 (December 14) | Kubecon NA - 14 weeks of actual work | +| 31 | 2 | 14 (August 02) | KubeCon EU Break | +| 34 | 3 | 1 (August 23) | | +| 50 | 3 | 16 (December 14) | Kubecon Break | *Kubernetes Release Sechedule 2022 (Proposed 3 Release Cadence)* @@ -256,6 +261,15 @@ cycle, with *2* weeks between release cycles. | 36 | 3 | 1 (September 6 | Probably KubeCon NA | | 51 | 3 | 15 (December 20) | + +This KEP will be in the `alpha` stage for the Kubernetes 1.22 Release. During this +time, we will focus on communication of the cadence change. The KEP will promote to +the `beta` stage for the Kubernetes 1.23 Release, which will be the final release of +2021 and being our official *3-releases-per-year* cadence. After the 1.23, 1.24, and +1.25 Releases, will will collect feedback and incorporate that feedback into the +lightweight framework surrounding release schedule development and promote this +KEP to `stable` for the 1.26 Release. + ### User Stories Kubernetes releases are made by real people. The technical aspects—for example, @@ -331,6 +345,26 @@ features. With a longer release cycle, this reminder activity can be expected to slow down. As such, advancement will need to be mitigated by making sure that SIGs keep track of their feature enhancement in more detail. +#### Unprepared Kubernetes Users + +Kubernetes effectively moved to a *3 release per year* cadence in 2020, starting +with the Kubernetes 1.19 release. At the start of the Kubernetes 1.21 release, +there was communication that a permanent cadence change was under consideration, +however this KEP was not submitted and approved within the Kubernetes 1.21 release +cycle, so there is a risk that downstream consumers may be unaware that such a +change was under consideration and could be caught by surprise by a cadence change. +Some downstream projects, such as Helm have already begun +[planning](https://github.com/helm/community/blob/main/hips/hip-0002.md#minor-releases) +for this change. + +To mitigate this risk, SIG-Release will perform the following actions: + +* Once this KEP has merged, an email will be sent to the [k/dev](https://groups.google.com/g/kubernetes-dev) list +* A community meeting occurring during the 1.22 Release Cycle will be used to communicate the change +* Early in the Kubernetes 1.22 Release cycle, a blog will be written and published to +https://kubernetes.io/blog/ that fully explains this change. +* A tweet (linking to the blog) will sent from the k8scontributors twitter account + ## Design Details ### Schedule Policy @@ -349,10 +383,12 @@ is defined as: 3. A Kubernetes release cycle has a length of of ~15 weeks. 4. Events like KubeCon will be considered as blocked from development or - decision-making from the SIG release perspective. SIG Release will also - consider the week before and after the event in the same way. + decision-making from the SIG release perspective and the release team will + not hold meetings during this week. The release team must also + consider the week before and after the event when setting deadlines to + minimize impact on contributors. -5. An explicit SIG release break of at least two weeks between each cycle will +5. An explicit SIG Release break of at least two weeks between each cycle will be enforced. This does not mean that zero development can happen during that time. @@ -361,9 +397,10 @@ is defined as: ### Feedback survey -SIG Release will draft an experience survey after the first three releases from -which the new cadence has been applied. This survey which will include questions -around the release cadence and how it impacted end users. +SIG Release will draft an experience survey and distribute it to [k/dev](https://groups.google.com/g/kubernetes-dev)) and include it in the release notes of the first *three* releases from +which the new cadence has been applied. This survey will include questions +around the release cadence and how it impacted end users and can be used to make a final +decision regarding release cadence (i.e. promoting this KEP to stable). Survey contents are to be determined, but we welcome content suggestions to continually improve the process. diff --git a/keps/sig-release/2572-release-cadence/kep.yaml b/keps/sig-release/2572-release-cadence/kep.yaml index f94a1f453ab..7f1b23fa2fa 100644 --- a/keps/sig-release/2572-release-cadence/kep.yaml +++ b/keps/sig-release/2572-release-cadence/kep.yaml @@ -35,4 +35,4 @@ latest-milestone: "v1.22" milestone: alpha: "v1.22" beta: "v1.23" - stable: "v1.25" + stable: "v1.26" From 43b190874deeb8be32a942645f9378bff821425c Mon Sep 17 00:00:00 2001 From: Sascha Grunert Date: Thu, 15 Apr 2021 09:46:14 +0200 Subject: [PATCH 29/34] Add increased enhancements lifecycle risk Signed-off-by: Sascha Grunert --- keps/sig-release/2572-release-cadence/README.md | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/keps/sig-release/2572-release-cadence/README.md b/keps/sig-release/2572-release-cadence/README.md index dded92b0459..2952640c692 100644 --- a/keps/sig-release/2572-release-cadence/README.md +++ b/keps/sig-release/2572-release-cadence/README.md @@ -29,6 +29,7 @@ - [Attention to dependencies](#attention-to-dependencies) - [Feature graduation](#feature-graduation) - [Unprepared Kubernetes Users](#unprepared-kubernetes-users) + - [Increased enhancements lifecycle](#increased-enhancements-lifecycle) - [Design Details](#design-details) - [Schedule Policy](#schedule-policy) - [Feedback survey](#feedback-survey) @@ -365,6 +366,17 @@ To mitigate this risk, SIG-Release will perform the following actions: https://kubernetes.io/blog/ that fully explains this change. * A tweet (linking to the blog) will sent from the k8scontributors twitter account +#### Increased enhancements lifecycle + +With a 4 releases/year cadence, an enhancement could graduate from alpha to beta +to GA in 9 months, with truly trivial features sometimes skipping beta. On the +proposed 3 releases/year cadence, the best possible case is 12 months for a +3-phase features and 8 months if skipping beta if the graduation rules will not +change. These drawn out timelines may cause more features to skip beta or take +more risks in advancing phases, even when not confident. The mitigation here is +human vigilance and engineering discipline to hold the line and say "no" when +appropriate. + ## Design Details ### Schedule Policy From 1d9d76fe56589ca42484cd53a687d478537c7c0f Mon Sep 17 00:00:00 2001 From: Jeremy Date: Thu, 15 Apr 2021 10:53:17 -0600 Subject: [PATCH 30/34] Update KEP with 15 week release for 1.22 Signed-off-by: Jeremy --- .../2572-release-cadence/README.md | 28 ++++++++++--------- 1 file changed, 15 insertions(+), 13 deletions(-) diff --git a/keps/sig-release/2572-release-cadence/README.md b/keps/sig-release/2572-release-cadence/README.md index 2952640c692..49f1dfc378c 100644 --- a/keps/sig-release/2572-release-cadence/README.md +++ b/keps/sig-release/2572-release-cadence/README.md @@ -61,8 +61,10 @@ ## Summary With this KEP, SIG Release proposes to change the current Kubernetes release -cadence from 4 down to 3 releases per year. This would start with the -Kubernetes 1.23 release cycle. +cadence from 4 down to 3 releases per year. This cadence started in ad-hoc manner +in 2020, and will be formalized with the KEP. This will be reflected in the +release calendar for the Kubernetes 1.22 and 1.23 releases, which will each be +*15* weeks in duration. ## Motivation @@ -195,7 +197,7 @@ https://github.com/kubernetes/sig-release/issues/1494. ## Proposal This KEP proposes a transition to a *3-releases-per-year cadence*, beginning -with the Kubernetes 1.23 Release. This would result in a *15* week release +with the Kubernetes 1.22 Release. This would result in a *15* week release cycle, with *2* weeks between release cycles. During the Kubernetes 1.22 release, a focused communication effort will be undertaken to communicate to contributors and the end user community. @@ -247,9 +249,9 @@ With the proposed change in cadence, the notional schedules for the remainder of | 2 | 1 | 1 (January 11) | | | 14 | 1 | 13 (April 8) | | | 17 | 2 | 1 (April 26) | | -| 31 | 2 | 14 (August 02) | KubeCon EU Break | -| 34 | 3 | 1 (August 23) | | -| 50 | 3 | 16 (December 14) | Kubecon Break | +| 32 | 2 | 15 (August 02) | KubeCon EU Break (May 4-7) | +| 35 | 3 | 1 (August 30) | | +| 50 | 3 | 15 (December 14) | Kubecon Break (Oct 12-15) | *Kubernetes Release Sechedule 2022 (Proposed 3 Release Cadence)* @@ -263,13 +265,13 @@ With the proposed change in cadence, the notional schedules for the remainder of | 51 | 3 | 15 (December 20) | -This KEP will be in the `alpha` stage for the Kubernetes 1.22 Release. During this -time, we will focus on communication of the cadence change. The KEP will promote to -the `beta` stage for the Kubernetes 1.23 Release, which will be the final release of -2021 and being our official *3-releases-per-year* cadence. After the 1.23, 1.24, and -1.25 Releases, will will collect feedback and incorporate that feedback into the -lightweight framework surrounding release schedule development and promote this -KEP to `stable` for the 1.26 Release. +This KEP will be in the `alpha` stage for the Kubernetes 1.22 Release. During +this time, SIG-Release will focus on communication of the cadence change through +all available mechanisms. The KEP will promote to the `beta` stage for the +Kubernetes 1.23 Release, which will be the final release of 2021. After +the 1.23, 1.24, and 1.25 Releases, will will collect feedback and incorporate +that feedback into the lightweight framework surrounding release schedule +development and promote this KEP to `stable` for the 1.26 Release. ### User Stories From 0def15c4112e07a529a30d6dc402948a17d97a1e Mon Sep 17 00:00:00 2001 From: Jeremy Date: Thu, 15 Apr 2021 16:35:03 -0600 Subject: [PATCH 31/34] Update authors, review feedback, clarifications Signed-off-by: Jeremy --- .../2572-release-cadence/README.md | 26 +++++++++++++------ .../sig-release/2572-release-cadence/kep.yaml | 7 +++-- 2 files changed, 23 insertions(+), 10 deletions(-) diff --git a/keps/sig-release/2572-release-cadence/README.md b/keps/sig-release/2572-release-cadence/README.md index 49f1dfc378c..2ffdad48029 100644 --- a/keps/sig-release/2572-release-cadence/README.md +++ b/keps/sig-release/2572-release-cadence/README.md @@ -61,10 +61,10 @@ ## Summary With this KEP, SIG Release proposes to change the current Kubernetes release -cadence from 4 down to 3 releases per year. This cadence started in ad-hoc manner -in 2020, and will be formalized with the KEP. This will be reflected in the -release calendar for the Kubernetes 1.22 and 1.23 releases, which will each be -*15* weeks in duration. +cadence from 4 down to 3 releases per year. This cadence started in ad hoc manner +in 2020 due to the ongoing COVID-19 pandemic. This KEP serves to formalize this release +cadence, which will be shape the development of the release calendars for the +Kubernetes 1.22 and 1.23 releases, each of which will be *15* weeks in duration. ## Motivation @@ -74,7 +74,9 @@ releases 4 times per year, are ongoing in the community. The extended release schedule for 1.19 resulted in only three minor Kubernetes releases for 2020. As a result, SIG Release received several questions across a variety of platforms and communication channels about whether the project -intends to only have three minor releases/year. +intends to only have three minor releases/year, as a lot of folks, both +contributors and end users, need to be able plan ahead and expect a predictable +release cadence. ### Goals @@ -196,14 +198,14 @@ https://github.com/kubernetes/sig-release/issues/1494. ## Proposal -This KEP proposes a transition to a *3-releases-per-year cadence*, beginning +This KEP proposes a transition to a *3-releases-per-calendar-year cadence*, beginning with the Kubernetes 1.22 Release. This would result in a *15* week release cycle, with *2* weeks between release cycles. During the Kubernetes 1.22 release, a focused communication effort will be undertaken to communicate to contributors and the end user community. The following tables detail a notional timeline for the remainder of 2021 and -for 2022, leveraging the historical *4-releases-per-year cadence*. Generally, +for 2022, leveraging the historical *4-releases-per-calendar-year cadence*. Generally, code freeze remains in effect until the last week of the release, so development for the next release generally starts prior to the official release team kickoff. A minimum of 1 week is needed between releases to fully form the @@ -273,12 +275,20 @@ the 1.23, 1.24, and 1.25 Releases, will will collect feedback and incorporate that feedback into the lightweight framework surrounding release schedule development and promote this KEP to `stable` for the 1.26 Release. +| Release Number | Stage | +|----------------|-------| +| 1.22 | Alpha | +| 1.23 | Beta | +| 1.24 | Beta | +| 1.25 | Beta | +| 1.26 | Stable | + ### User Stories Kubernetes releases are made by real people. The technical aspects—for example, the release automation—reflects only a tiny part of the complete cycle. This means we will mainly focus on the human aspects and their corresponding roles -when deciding to move to a 3-releases-per-year cadence. +when deciding to move to a 3-releases-per-calendar-year cadence. #### End User diff --git a/keps/sig-release/2572-release-cadence/kep.yaml b/keps/sig-release/2572-release-cadence/kep.yaml index 7f1b23fa2fa..d740647726d 100644 --- a/keps/sig-release/2572-release-cadence/kep.yaml +++ b/keps/sig-release/2572-release-cadence/kep.yaml @@ -1,7 +1,12 @@ title: Defining the Kubernetes Release Cadence kep-number: 2572 authors: + - "@kikisdeliveryservice" + - "@jeremyrickard" + - "@jberkus" - "@justaugustus" + - "@LappleApple" + - "@saschagrunert" owning-sig: sig-release participating-sigs: - sig-architecture @@ -11,8 +16,6 @@ creation-date: 2021-01-21 reviewers: - "@ehashman" - "@hasheddan" - - "@jeremyrickard" - approvers: - "@BenTheElder" #sig-testing - "@derekwaynecarr" #sig-architecture From d696d3280251c0fd4727706be4124c4a72146858 Mon Sep 17 00:00:00 2001 From: Sascha Grunert Date: Mon, 19 Apr 2021 09:21:25 +0200 Subject: [PATCH 32/34] Fix formatting of survey paragraph Signed-off-by: Sascha Grunert --- keps/sig-release/2572-release-cadence/README.md | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/keps/sig-release/2572-release-cadence/README.md b/keps/sig-release/2572-release-cadence/README.md index 2ffdad48029..34d40eabf5b 100644 --- a/keps/sig-release/2572-release-cadence/README.md +++ b/keps/sig-release/2572-release-cadence/README.md @@ -421,10 +421,12 @@ is defined as: ### Feedback survey -SIG Release will draft an experience survey and distribute it to [k/dev](https://groups.google.com/g/kubernetes-dev)) and include it in the release notes of the first *three* releases from -which the new cadence has been applied. This survey will include questions -around the release cadence and how it impacted end users and can be used to make a final -decision regarding release cadence (i.e. promoting this KEP to stable). +SIG Release will draft an experience survey, distribute it to +[k/dev](https://groups.google.com/g/kubernetes-dev) and include it in the +release notes of the first *three* releases from which the new cadence has been +applied. This survey will include questions around the release cadence and how +it impacted end users and can be used to make a final decision regarding release +cadence (i.e. promoting this KEP to stable). Survey contents are to be determined, but we welcome content suggestions to continually improve the process. From 41d22d389e6c746184e404844f2f919594ef4b80 Mon Sep 17 00:00:00 2001 From: Jeremy Date: Tue, 20 Apr 2021 07:52:29 -0600 Subject: [PATCH 33/34] Amend 2022 dates Signed-off-by: Jeremy --- keps/sig-release/2572-release-cadence/README.md | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/keps/sig-release/2572-release-cadence/README.md b/keps/sig-release/2572-release-cadence/README.md index 34d40eabf5b..d33193d7da0 100644 --- a/keps/sig-release/2572-release-cadence/README.md +++ b/keps/sig-release/2572-release-cadence/README.md @@ -261,11 +261,10 @@ With the proposed change in cadence, the notional schedules for the remainder of | -------- | -------- | -------- | -------- | | 1 | 1 | 1 (January 3) | | | 15 | 1 | 15 (April 12) | | -| 18 | 2 | 1 (May 2) | Probably KubeCon EU | -| 33 | 2 | 15 (August 15) | | -| 36 | 3 | 1 (September 6 | Probably KubeCon NA | -| 51 | 3 | 15 (December 20) | - +| 17 | 2 | 1 (April 26) | Probably KubeCon EU | +| 32 | 2 | 15 (August 09) | | +| 34 | 3 | 1 (August 22 | Probably KubeCon NA | +| 49 | 3 | 14 (December 06) | This KEP will be in the `alpha` stage for the Kubernetes 1.22 Release. During this time, SIG-Release will focus on communication of the cadence change through From 2887453eac5cbc5fbd31112fd3d0be2be17b456c Mon Sep 17 00:00:00 2001 From: Jeremy Date: Tue, 20 Apr 2021 07:56:46 -0600 Subject: [PATCH 34/34] review comment cleanup Signed-off-by: Jeremy --- keps/sig-release/2572-release-cadence/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/keps/sig-release/2572-release-cadence/README.md b/keps/sig-release/2572-release-cadence/README.md index d33193d7da0..52e21b6f1b5 100644 --- a/keps/sig-release/2572-release-cadence/README.md +++ b/keps/sig-release/2572-release-cadence/README.md @@ -63,7 +63,7 @@ With this KEP, SIG Release proposes to change the current Kubernetes release cadence from 4 down to 3 releases per year. This cadence started in ad hoc manner in 2020 due to the ongoing COVID-19 pandemic. This KEP serves to formalize this release -cadence, which will be shape the development of the release calendars for the +cadence, which will shape the development of the release calendars for the Kubernetes 1.22 and 1.23 releases, each of which will be *15* weeks in duration. ## Motivation