-
-
Notifications
You must be signed in to change notification settings - Fork 15k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
k3s: add packaging README regarding release versioning #224483
Merged
Merged
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,73 @@ | ||
# k3s versions | ||
|
||
K3s, Kubernetes, and other clustered software has the property of not being able to update atomically. Most software in nixpkgs, like for example bash, can be updated as part of a "nixos-rebuild switch" without having to worry about the old and the new bash interacting in some way. | ||
|
||
K3s/Kubernetes, on the other hand, is typically run across several NixOS machines, and each NixOS machine is updated independently. As such, different versions of the package and NixOS module must maintain compatibility with each other through temporary version skew during updates. | ||
|
||
The upstream Kubernetes project [documents this in their version-skew policy](https://kubernetes.io/releases/version-skew-policy/#supported-component-upgrade-order). | ||
|
||
Within nixpkgs, we strive to maintain a valid "upgrade path" that does not run | ||
afoul of the upstream version skew policy. | ||
|
||
## Upstream release cadence and support | ||
|
||
K3s is built on top of K8s, and typically provides a similar release cadence and support window (simply by cherry-picking over k8s patches). As such, we assume k3s's support lifecycle is identical to upstream K8s. | ||
|
||
This is documented upstream [here](https://kubernetes.io/releases/patch-releases/#support-period). | ||
|
||
In short, a new Kubernetes version is released roughly every 4 months, and each release is supported for a little over 1 year. | ||
|
||
Any version that is not supported by upstream should be dropped from nixpkgs. | ||
|
||
## Versions in NixOS releases | ||
|
||
NixOS releases should avoid having deprecated software, or making major version upgrades, wherever possible. | ||
|
||
As such, we would like to have only the newest K3s version in each NixOS | ||
release at the time the release branch is branched off, which will ensure the | ||
K3s version in that release will receieve updates for the longest duration | ||
possible. | ||
|
||
However, this conflicts with another desire: we would like people to be able to upgrade between NixOS stable releases without needing to make a large enough k3s version jump that they violate the Kubernetes version skew policy. | ||
|
||
To give an example, we may have the following timeline for k8s releases: | ||
|
||
(Note, the exact versions and dates may be wrong, this is an illustrative example, reality may differ). | ||
|
||
```mermaid | ||
gitGraph | ||
branch k8s | ||
commit | ||
branch "k8s-1.24" | ||
checkout "k8s-1.24" | ||
commit id: "1.24.0" tag: "2022-05-03" | ||
branch "k8s-1.25" | ||
checkout "k8s-1.25" | ||
commit id: "1.25.0" tag: "2022-08-23" | ||
branch "k8s-1.26" | ||
checkout "k8s-1.26" | ||
commit id: "1.26.0" tag: "2022-12-08" | ||
checkout k8s-1.24 | ||
commit id: "1.24-EOL" tag: "2023-07-28" | ||
checkout k8s-1.25 | ||
commit id: "1.25-EOL" tag: "2023-10-27" | ||
checkout k8s-1.26 | ||
commit id: "1.26-EOL" tag: "2024-02-28" | ||
``` | ||
|
||
(Note: the above graph will render if you view this markdown on GitHub, or when using [mermaid](https://mermaid.js.org/)) | ||
|
||
In this scenario even though k3s 1.24 is still technically supported when the NixOS 23.05 | ||
release is cut, since it goes EOL before the NixOS 23.11 release is made, we would | ||
not want to include it. Similarly, k3s 1.25 would go EOL before NixOS 23.11. | ||
|
||
As such, we should only include k3s 1.26 in the 23.05 release. | ||
|
||
We can then make a similar argument when NixOS 23.11 comes around to not | ||
include k3s 1.26 or 1.27. However, that means someone upgrading from the NixOS | ||
22.05 release to the NixOS 23.11 would not have a supported upgrade path. | ||
|
||
In order to resolve this issue, we propose backporting not just new patch releases to older NixOS releases, but also new k3s versions, up to one version before the first version that is included in the next NixOS release. | ||
|
||
In the above example, where NixOS 23.05 included k3s 1.26, and 23.11 included k3s 1.28, that means we would backport 1.27 to the NixOS 23.05 release, and backport all patches for 1.26 and 1.27. | ||
This would allow someone to upgrade between those NixOS releases in a supported configuration. |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm k3s maintainer. I opened the original issue about the need of having several versions of k3s in parallel during the lifecycle of a release. In #213943 (comment) you have all the details.
As you can see there, the very 1st requirement for doing a properly supported k8s upgrade is:
We need to keep the different versions alive to be able to comply with this very 1st requirement.
For example, if we drop k3s 1.25.8+k3s1 before releasing nixos 22.05 and then, one month later, there's a new k3s 1.25.9+k3s1 release, users would be unable upgrade unless:
None of these is nice, so we should keep the versions around.
FWIW, sometimes one can't upgrade k8s directly, not because the release isn't yet out, but because you have some operator that still doesn't support the new version. Upgrading a cluster is a very delicate operation, so in this case I think we should just have available versions for all the upstream supported versions, as long as they're supported.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are shuffling around in which NixOS release we keep which k3s release, to try to overlap the supported releases with the NixOS support cycle.
Can you point out where this plan comes out short?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought that's what I just did 😅. I guess I didn't explain myself finely... it'll be better with an example.
A real-world example: dealing with Rancher
Let's say it's 2023-05-01. NixOS 23.05 is released. My servers are using NixOS 22.11 with k3s-v1.24.10+k3s1. I want to upgrade them.
Why am I using that K3s version instead of 1.25 or 1.26? Because I use Rancher 2.7.1 on that cluster. And, according to Rancher's support matrix, the highest supported k3s version by Rancher 2.7.1 is 1.24:
Also because 1.24 is still a supported k8s and k3s release until 2023-07-28. So, everything is supported if I stay on 1.24. If I update to 1.25, Rancher is not supported. Thus, I stay on 1.24.
When will Rancher support 1.25? According to rancher/rancher#38701, quite soon in the 2.7.2 release. But I still don't know when that'll be released.
What am I to expect from NixOS? Well, I expect it still has K3s 1.24 releases available, because that's still supported upstream. Let's say NixOS is nice to me and does that. I upgrade my servers to NixOS 23.05 but keep K3s running on the 1.24 derivation.
Time goes by, a couple of weeks pass, and we're at 2023-05-15. It turns out Rancher 2.7.2 got released. It supports k3s 1.25. Cool! Let's upgrade. I install Rancher through the helm chart, so it has nothing to do with NixOS. Let's say I do that and it upgrades without problems.
Ok, time to upgrade my cluster! How? Following #213943 (comment). As explained above, step number 1 is to upgrade the cluster to the latest patch release of the minor release I'm currently using. Which one is it? k3s v1.24.12+k3s1 is already available (although in this future scenario, it could be something even newer).
Since I maintain K3s and just noticed it's some versions behind upstream, I open a PR to nixpkgs, we merge that, I update my servers, and get the latest patch release for 1.24 (which BTW includes CVE fixes). The task is done: I'm on the latest K3s 1.4.x release. 🏆
Now I must update to K3s 1.25.x on the most updated patch version. Let's take a look. Currently, on NixOS that's 1.25.3+k3s1; but upstream is on v1.25.8+k3s1 already. Just like before, I update it on nixpkgs before proceeding to the next step.
The next step is a bit more delicate. I have to upgrade my cluster to 1.25 by order (servers first, one by one; then workers in no particular order). K3s 1.25.8 is already on nixpkgs, so I upgrade my servers doing that process. Cool! , finished!
Now, should I take this chance to update to 1.26? Well, I'll have to start over again:
How does the example matter to NixOS?
The example shows that upgrading K3s for production is complex and delicate. It also shows that a sysadmin can still need to stick to lower-but-still-supported releases for a while because of good reasons.
If nixpkgs drops support for K3s < 1.26 while upstream still supports them, then the required step of upgrading to the newest patch release of the minor release you're currently running can't be done (with official packages).
NixOS users should be able to predict k3s support based on the upstream calendar, because the other in-cluster tools that they are using use that calendar, not NixOS'.
My proposal
So IMHO, to make NixOS the best OS for running k3s, it should:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is what got you spooked. The idea going forward is:
Basically we're flipping the order how things are done. Instead of stuffing the new NixOS release with end of life releases, that the user needs to pass through for updates, we instead provide an update path on the previous NixOS release.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, do you mean that the new plan would be this?
That last bold point is the pain point for me. According to https://endoflife.date/nixos, NixOS 22.11 will EOL on 2023-06-30. So NixOS will go EOL before k3s 1.25 goes EOL.
So, does that mean that by the time I can upgrade the cluster to 1.26 (if Rancher takes more than NixOS to upgrade support) I won't have an upgrade path? 😵
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand what you're saying, @yajo, and I think it's sorta a real concern, but I don't think NixOS actually wants to support it.
The observation, if I understand it correctly, is that the following points conflict:
That seems true, but I also think that you can only encounter that issue if you're running an unsupported NixOS release. It seems totally expected that an unsupported NixOS release isn't supported, which I think sums up the issue there.
Said another way, having a "correct" path to upgrade is a moving target, and while the plan described in this document makes it so we hit that target while NixOS releases are still supported, upstream changes may make it so we no longer meet that target.
In your example, if you updated to NixOS 23.05 / k3s 1.26 before 22.11 went out of support, you would remain in a supported configuration by NixOS and k3s/k8s the whole time.
I think that's totally fine. Stay on supported NixOS releases, and things can work, stay on an unsupported release, and you're now in an unsupported path (that still probably works! it was supported in the past!).
Which brings us to the other point you're discussing - Rancher.
It seems like Rancher's support matrix lags behind quite a bit.
I think the actionable thing you're requesting here is to update the policy from "NixOS's supported releases attempt to have the latest k3s release when it is cut, ensuring it is supported for the NixOS release lifecycle" to "NixOS's supported releases have the latest k3s release and a k3s release supported by Rancher".
I think if we change to that statement, the rest naturally falls out of that correctly.
That said, I personally don't want to support older k3s versions. I don't use Rancher, and their support matrix and updates seem to be at a pace which doesn't really align that well with NixOS's release lifecycle, so I'm wary it's not a great fit.
Is there some factor that makes tying our supported versions to rancher's slower support matrix compelling?
Do I understand the issue you're seeing here correctly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean, if someone wants to run unsupported NixOS release and get backports of patch releases for k3s, it's not really hard to do it (I would even go to fairly trivial in my experience), but you have to do it yourself or pay someone to do it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, this is unexpected and confusing, TBH.
I just used Rancher as an example. But I have a mix of operators, apps and custom deployments running in K8s where each one of them evolves at a different pace. I just picked the 1st that would mean a problem. Rancher, in this case. But it is quite easy to see that any of them can be a problem because of this choice on NixOS side.
It is expected that all of them support the currently-supported k8s versions. But we can't expect all of them to support the latest k8s version at the date of launch of the latest NixOS version. Even less when there are 2 NixOS releases per year and 3 k8s releases per year. There'll always be some drift.
With the proposed "solution", you force NixOS users to choose between:
Not a very pleasant choice to make.
There are other cases where NixOS has various supported versions of the same app. You can use python37, python38, python39, python310 and python311 only on NixOS 22.11. Ain't that the magic of NixOS? Why can't we just do the same for k3s?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies for the slow response!
I guess I don't really know the best thing to do here. I agree that other apps/operators can lag behind some amount, which can make it more difficult to upgrade promptly.
Basically, I think the options we have on the NixOS side are:
I'm arguing for 2 because it's less maintenance work, and because in practice I haven't run into the issues you speak of. Everything I use has worked "fine" when upgrading, even if I upgrade before they announce official support or such. The k8s project's backwards compatibility story means that's supposed to typically be the case.
I believe you're arguing for 3, right?