Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

k3s: add packaging README regarding release versioning #224483

Merged
merged 1 commit into from
May 22, 2023
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
73 changes: 73 additions & 0 deletions pkgs/applications/networking/cluster/k3s/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
# k3s versions

K3s, Kubernetes, and other clustered software has the property of not being able to update atomically. Most software in nixpkgs, like for example bash, can be updated as part of a "nixos-rebuild switch" without having to worry about the old and the new bash interacting in some way.

K3s/Kubernetes, on the other hand, is typically run across several NixOS machines, and each NixOS machine is updated independently. As such, different versions of the package and NixOS module must maintain compatibility with each other through temporary version skew during updates.

The upstream Kubernetes project [documents this in their version-skew policy](https://kubernetes.io/releases/version-skew-policy/#supported-component-upgrade-order).

Within nixpkgs, we strive to maintain a valid "upgrade path" that does not run
afoul of the upstream version skew policy.

## Upstream release cadence and support

K3s is built on top of K8s, and typically provides a similar release cadence and support window (simply by cherry-picking over k8s patches). As such, we assume k3s's support lifecycle is identical to upstream K8s.

This is documented upstream [here](https://kubernetes.io/releases/patch-releases/#support-period).

In short, a new Kubernetes version is released roughly every 4 months, and each release is supported for a little over 1 year.

Any version that is not supported by upstream should be dropped from nixpkgs.

## Versions in NixOS releases

NixOS releases should avoid having deprecated software, or making major version upgrades, wherever possible.

As such, we would like to have only the newest K3s version in each NixOS
release at the time the release branch is branched off, which will ensure the
K3s version in that release will receieve updates for the longest duration
possible.

However, this conflicts with another desire: we would like people to be able to upgrade between NixOS stable releases without needing to make a large enough k3s version jump that they violate the Kubernetes version skew policy.

To give an example, we may have the following timeline for k8s releases:

(Note, the exact versions and dates may be wrong, this is an illustrative example, reality may differ).

```mermaid
gitGraph
branch k8s
commit
branch "k8s-1.24"
checkout "k8s-1.24"
commit id: "1.24.0" tag: "2022-05-03"
branch "k8s-1.25"
checkout "k8s-1.25"
commit id: "1.25.0" tag: "2022-08-23"
branch "k8s-1.26"
checkout "k8s-1.26"
commit id: "1.26.0" tag: "2022-12-08"
checkout k8s-1.24
commit id: "1.24-EOL" tag: "2023-07-28"
checkout k8s-1.25
commit id: "1.25-EOL" tag: "2023-10-27"
checkout k8s-1.26
commit id: "1.26-EOL" tag: "2024-02-28"
```

(Note: the above graph will render if you view this markdown on GitHub, or when using [mermaid](https://mermaid.js.org/))

In this scenario even though k3s 1.24 is still technically supported when the NixOS 23.05
release is cut, since it goes EOL before the NixOS 23.11 release is made, we would
not want to include it. Similarly, k3s 1.25 would go EOL before NixOS 23.11.

As such, we should only include k3s 1.26 in the 23.05 release.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm k3s maintainer. I opened the original issue about the need of having several versions of k3s in parallel during the lifecycle of a release. In #213943 (comment) you have all the details.

As you can see there, the very 1st requirement for doing a properly supported k8s upgrade is:

Upgrade your server nodes to the latest patch version available. One node at a time.

We need to keep the different versions alive to be able to comply with this very 1st requirement.

For example, if we drop k3s 1.25.8+k3s1 before releasing nixos 22.05 and then, one month later, there's a new k3s 1.25.9+k3s1 release, users would be unable upgrade unless:

  1. They upgrade ignoring the recommended supported upstream procedure.
  2. They package 1.25 themselves.

None of these is nice, so we should keep the versions around.

FWIW, sometimes one can't upgrade k8s directly, not because the release isn't yet out, but because you have some operator that still doesn't support the new version. Upgrading a cluster is a very delicate operation, so in this case I think we should just have available versions for all the upstream supported versions, as long as they're supported.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are shuffling around in which NixOS release we keep which k3s release, to try to overlap the supported releases with the NixOS support cycle.

Can you point out where this plan comes out short?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought that's what I just did 😅. I guess I didn't explain myself finely... it'll be better with an example.

A real-world example: dealing with Rancher

Let's say it's 2023-05-01. NixOS 23.05 is released. My servers are using NixOS 22.11 with k3s-v1.24.10+k3s1. I want to upgrade them.

Why am I using that K3s version instead of 1.25 or 1.26? Because I use Rancher 2.7.1 on that cluster. And, according to Rancher's support matrix, the highest supported k3s version by Rancher 2.7.1 is 1.24:

image

Also because 1.24 is still a supported k8s and k3s release until 2023-07-28. So, everything is supported if I stay on 1.24. If I update to 1.25, Rancher is not supported. Thus, I stay on 1.24.

When will Rancher support 1.25? According to rancher/rancher#38701, quite soon in the 2.7.2 release. But I still don't know when that'll be released.

What am I to expect from NixOS? Well, I expect it still has K3s 1.24 releases available, because that's still supported upstream. Let's say NixOS is nice to me and does that. I upgrade my servers to NixOS 23.05 but keep K3s running on the 1.24 derivation.

Time goes by, a couple of weeks pass, and we're at 2023-05-15. It turns out Rancher 2.7.2 got released. It supports k3s 1.25. Cool! Let's upgrade. I install Rancher through the helm chart, so it has nothing to do with NixOS. Let's say I do that and it upgrades without problems.

Ok, time to upgrade my cluster! How? Following #213943 (comment). As explained above, step number 1 is to upgrade the cluster to the latest patch release of the minor release I'm currently using. Which one is it? k3s v1.24.12+k3s1 is already available (although in this future scenario, it could be something even newer).

Since I maintain K3s and just noticed it's some versions behind upstream, I open a PR to nixpkgs, we merge that, I update my servers, and get the latest patch release for 1.24 (which BTW includes CVE fixes). The task is done: I'm on the latest K3s 1.4.x release. 🏆

Now I must update to K3s 1.25.x on the most updated patch version. Let's take a look. Currently, on NixOS that's 1.25.3+k3s1; but upstream is on v1.25.8+k3s1 already. Just like before, I update it on nixpkgs before proceeding to the next step.

The next step is a bit more delicate. I have to upgrade my cluster to 1.25 by order (servers first, one by one; then workers in no particular order). K3s 1.25.8 is already on nixpkgs, so I upgrade my servers doing that process. Cool! , finished!

Now, should I take this chance to update to 1.26? Well, I'll have to start over again:

  1. Make sure Rancher supports k3s 1.26
  2. If so, make sure NixOS is on the latest k3s 1.25.x and 1.26.x releases (This time I have more chances to get a "yes" because there's an automated update script).
  3. When that's done, Do the update.

How does the example matter to NixOS?

The example shows that upgrading K3s for production is complex and delicate. It also shows that a sysadmin can still need to stick to lower-but-still-supported releases for a while because of good reasons.

If nixpkgs drops support for K3s < 1.26 while upstream still supports them, then the required step of upgrading to the newest patch release of the minor release you're currently running can't be done (with official packages).

NixOS users should be able to predict k3s support based on the upstream calendar, because the other in-cluster tools that they are using use that calendar, not NixOS'.

My proposal

So IMHO, to make NixOS the best OS for running k3s, it should:

  • Provide one K3s package per minor version supported upstream at the date of the launch of NixOS.
  • Automated update scripts for all those minor versions, so our dear update bot makes sure they always match the latest patch version from upstream. This is done already for 1.26, so we can stick with manual updates for prior versions and just care for this on >= 1.26 if you want.
  • Once upstream drops support of a minor version, NixOS does too. But not before.

Copy link
Member

@mweinelt mweinelt Apr 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As such, we should only include k3s 1.26 in the 23.05 release.

I think this is what got you spooked. The idea going forward is:

  • Include only the latest release in a new NixOS release
    • so that k3s versions don't go EOL during a NixOS release support lifecycle
  • Backport all minor releases into the previous release, except for the latest
    • so that the oldest k3s release in the previous NixOS release is just one minor release before the one in the new NixOS release
  • Backport all newer releases of k3s into the new release
    • until a k3s release support covers the full release cycle of NixOS n+1

Basically we're flipping the order how things are done. Instead of stuffing the new NixOS release with end of life releases, that the user needs to pass through for updates, we instead provide an update path on the previous NixOS release.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, do you mean that the new plan would be this?

  • NixOS 23.05:
    • Released with k3s 1.26
    • Gets k3s 1.27 and 1.28 when they're published.
  • NixOS 22.11:
    • Has only k3s 1.24-1.25
    • Will keep getting patches for 1.24.x and 1.25.x for the whole lifecycle of NixOS 23.05. (Pay attention here because I don't think this will be true).

That last bold point is the pain point for me. According to https://endoflife.date/nixos, NixOS 22.11 will EOL on 2023-06-30. So NixOS will go EOL before k3s 1.25 goes EOL.

So, does that mean that by the time I can upgrade the cluster to 1.26 (if Rancher takes more than NixOS to upgrade support) I won't have an upgrade path? 😵

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand what you're saying, @yajo, and I think it's sorta a real concern, but I don't think NixOS actually wants to support it.

The observation, if I understand it correctly, is that the following points conflict:

  1. Old unsupported NixOS releases will not receive updates (naturally)
  2. K8s requires you to update to the latest patch release before doing a major release
  3. Therefore, updating from an unsupported NixOS release to a newer one will be unsupported if there are any k8s patch releases

That seems true, but I also think that you can only encounter that issue if you're running an unsupported NixOS release. It seems totally expected that an unsupported NixOS release isn't supported, which I think sums up the issue there.

Said another way, having a "correct" path to upgrade is a moving target, and while the plan described in this document makes it so we hit that target while NixOS releases are still supported, upstream changes may make it so we no longer meet that target.

In your example, if you updated to NixOS 23.05 / k3s 1.26 before 22.11 went out of support, you would remain in a supported configuration by NixOS and k3s/k8s the whole time.
I think that's totally fine. Stay on supported NixOS releases, and things can work, stay on an unsupported release, and you're now in an unsupported path (that still probably works! it was supported in the past!).

Which brings us to the other point you're discussing - Rancher.
It seems like Rancher's support matrix lags behind quite a bit.

I think the actionable thing you're requesting here is to update the policy from "NixOS's supported releases attempt to have the latest k3s release when it is cut, ensuring it is supported for the NixOS release lifecycle" to "NixOS's supported releases have the latest k3s release and a k3s release supported by Rancher".

I think if we change to that statement, the rest naturally falls out of that correctly.

That said, I personally don't want to support older k3s versions. I don't use Rancher, and their support matrix and updates seem to be at a pace which doesn't really align that well with NixOS's release lifecycle, so I'm wary it's not a great fit.

Is there some factor that makes tying our supported versions to rancher's slower support matrix compelling?
Do I understand the issue you're seeing here correctly?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems true, but I also think that you can only encounter that issue if you're running an unsupported NixOS release. It seems totally expected that an unsupported NixOS release isn't supported, which I think sums up the issue there.

I mean, if someone wants to run unsupported NixOS release and get backports of patch releases for k3s, it's not really hard to do it (I would even go to fairly trivial in my experience), but you have to do it yourself or pay someone to do it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, this is unexpected and confusing, TBH.

I just used Rancher as an example. But I have a mix of operators, apps and custom deployments running in K8s where each one of them evolves at a different pace. I just picked the 1st that would mean a problem. Rancher, in this case. But it is quite easy to see that any of them can be a problem because of this choice on NixOS side.

It is expected that all of them support the currently-supported k8s versions. But we can't expect all of them to support the latest k8s version at the date of launch of the latest NixOS version. Even less when there are 2 NixOS releases per year and 3 k8s releases per year. There'll always be some drift.

With the proposed "solution", you force NixOS users to choose between:

  1. Running on an unsupported K8s version.
  2. Running on an unsupported NixOS version.
  3. Upgrading using an unsupported process.
  4. Running apps / operators on unsupported platforms.

Not a very pleasant choice to make.

There are other cases where NixOS has various supported versions of the same app. You can use python37, python38, python39, python310 and python311 only on NixOS 22.11. Ain't that the magic of NixOS? Why can't we just do the same for k3s?

Copy link
Member Author

@euank euank May 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for the slow response!

I guess I don't really know the best thing to do here. I agree that other apps/operators can lag behind some amount, which can make it more difficult to upgrade promptly.

Basically, I think the options we have on the NixOS side are:

  1. For each release, maintain the maximum number of versions of k3s that will be supported throughout the whole release
  2. For each release, maintain the minimum number that will allow a safe upgrade path
  3. For each release, maintain all k3s versions that were supported when the release was cut, even if they may go EOL during it.

I'm arguing for 2 because it's less maintenance work, and because in practice I haven't run into the issues you speak of. Everything I use has worked "fine" when upgrading, even if I upgrade before they announce official support or such. The k8s project's backwards compatibility story means that's supposed to typically be the case.

I believe you're arguing for 3, right?


We can then make a similar argument when NixOS 23.11 comes around to not
include k3s 1.26 or 1.27. However, that means someone upgrading from the NixOS
22.05 release to the NixOS 23.11 would not have a supported upgrade path.

In order to resolve this issue, we propose backporting not just new patch releases to older NixOS releases, but also new k3s versions, up to one version before the first version that is included in the next NixOS release.

In the above example, where NixOS 23.05 included k3s 1.26, and 23.11 included k3s 1.28, that means we would backport 1.27 to the NixOS 23.05 release, and backport all patches for 1.26 and 1.27.
This would allow someone to upgrade between those NixOS releases in a supported configuration.