Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zfs: stable supports up to Linux 6.4 #245561

Closed
wants to merge 2 commits into from

Conversation

RaitoBezarius
Copy link
Member

@RaitoBezarius RaitoBezarius commented Jul 26, 2023

Description of changes

ZFS stable supports up to 6.4 in fact, this is not clearly communicated in the META file
but it does not matter, I have been running 6.4 for a while with ZFS (and even bcachefs).

Unblocks #244883.

Things done
  • Built on platform(s)
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • For non-Linux: Is sandbox = true set in nix.conf? (See Nix manual)
  • Tested, as applicable:
  • Tested compilation of all packages that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage
  • Tested basic functionality of all binary files (usually in ./result/bin/)
  • 23.11 Release Notes (or backporting 23.05 Release notes)
    • (Package updates) Added a release notes entry if the change is major or breaking
    • (Module updates) Added a release notes entry if the change is significant
    • (Module addition) Added a release notes entry if adding a new NixOS module
  • Fits CONTRIBUTING.md.

@RaitoBezarius RaitoBezarius requested a review from Mic92 July 26, 2023 16:37
@RaitoBezarius RaitoBezarius changed the title zfs: stable supports up to 6.4 zfs: stable supports up to Linux 6.4 Jul 26, 2023
ZFS stable supports up to Linux 6.4 in fact, this is not clearly communicated in the META file
but it does not matter, I have been running 6.4 for a while with ZFS (and even bcachefs).
The former maintainers are not active anymore.
@ofborg ofborg bot added 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin 10.rebuild-linux: 0 This PR does not cause any packages to rebuild on Linux labels Jul 26, 2023
@ius
Copy link
Contributor

ius commented Jul 26, 2023

First of all: really appreciate the effort of trying to avoid stepping zfs compat backwards. Thanks!

That said, I am slightly concerned about deviating from upstreams' stability guarantees. According to this comment from a ZFS developer:

The version in the META file represents latest kernel which we have tested. Fast moving downstream distributions, and end users building from source, are always welcome to perform their own testing and bump that max version as they see fit. What we want to avoid is misleading anyone about what has, and has not, been tested by the developers.

I don't think we can match the same level of testing, or can we? Filesystem bugs are danger zone™

Personally, I'd prefer keeping EOL kernels around until zfs formally catches up, but that touches on another (controversial) debate..

@RaitoBezarius
Copy link
Member Author

RaitoBezarius commented Jul 26, 2023 via email

@numinit
Copy link
Contributor

numinit commented Jul 27, 2023

I think this is a fine approach.

@numinit
Copy link
Contributor

numinit commented Jul 27, 2023

Note that ZFS' META was recently updated to indicate Linux 6.4 compat:

openzfs/zfs@fb344f5

Provided, this is their master, but I'd expect recent-ish releases (maybe 2.1.13 soon?) to include it.

@adamcstephens
Copy link
Contributor

I don't think I'm comfortable marking zfs for support higher than what upstream does. While 2.1.12 has fixes for 6.4, for whatever reason upstream did not mark it as supported. Even the first two RC's of 2.2 are only 6.3, but I hope that to change for the next release.

Given the unpredictability on when a kernel will be downgraded, what if instead we remove latestCompatibleLinuxPackages completely? Users can always manually pin to a specific kernel version, and this is what I do to avoid downgrading. Then when that kernel support is dropped they will have to make an explicit choice to downgrade, switch to zfs-unstable, wait for zfs to officially support the version they're on, or other alternatives which are the user's responsibility..

As @RaitoBezarius points out, users who are more concerned about stability likely should stay on LTS if at all possible.

@RaitoBezarius
Copy link
Member Author

RaitoBezarius commented Jul 27, 2023

I don't think I'm comfortable marking zfs for support higher than what upstream does. While 2.1.12 has fixes for 6.4, for whatever reason upstream did not mark it as supported. Even the first two RC's of 2.2 are only 6.3, but I hope that to change for the next release.

I'm honestly not convinced by this argument, as I said in my first post, does anyone what bumping the META file entails in terms of testing? If what happens is that they idly check it passes OpenZFS test suite and they bump it, does OpenZFS developers owe you something if eat your data?

Anyway, I will not force my way towards this PR in any case, I am already using this patch on my production systems for > 3 weeks now.

In all cases, I will now ask to proceed with the dropping PR.

@RaitoBezarius
Copy link
Member Author

Given the unpredictability on when a kernel will be downgraded, what if instead we remove latestCompatibleLinuxPackages completely? Users can always manually pin to a specific kernel version, and this is what I do to avoid downgrading. Then when that kernel support is dropped they will have to make an explicit choice to downgrade, switch to zfs-unstable, wait for zfs to officially support the version they're on, or other alternatives which are the user's responsibility..

Also, not in favor of that because I use latestCompatibleLinuxPackages quite successfully on my side, I agree that pinning kernel versions are better though.

It's a tough call because I feel like we are going into circles regarding this problem.

@adamcstephens
Copy link
Contributor

It's a tough call because I feel like we are going into circles regarding this problem.

Yeah, I understand this sentiment completely. My personal opinion would be to leave the EOL kernel in tree, but mark it as insecure. I don't maintain the kernel package though and I know that suggestion is not favored by many people.

Whatever the solution, I'd rather fail evaluating the config rather than silently downgrading. The status quo doesn't impact me though, given that I pin my kernel when not using LTS.

@RaitoBezarius
Copy link
Member Author

It's a tough call because I feel like we are going into circles regarding this problem.

Yeah, I understand this sentiment completely. My personal opinion would be to leave the EOL kernel in tree, but mark it as insecure. I don't maintain the kernel package though and I know that suggestion is not favored by many people.

I am extremely against this solution personally.

Whatever the solution, I'd rather fail evaluating the config rather than silently downgrading. The status quo doesn't impact me though, given that I pin my kernel when not using LTS.

Silent downgrading is definitely not supposed to break anything by virtue of downgrading will always send you to the LTS kernel at least… Of course, reality is much more annoying than that, but if you are a user who needs a non-LTS kernel with ZFS, you are out of luck except if you buy into insecure kernels.

@RaitoBezarius
Copy link
Member Author

Anyway, I also pinged ZFS upstream to ask them about 6.4 META for 2.1.(12|13), we will see how it goes. Let's keep this PR as a draft for now.

@RaitoBezarius RaitoBezarius marked this pull request as draft July 27, 2023 16:52
@numinit
Copy link
Contributor

numinit commented Jul 28, 2023

Thanks. It seems like they got a handful of 6.4 compat fixes in for 2.1.12, though I don't know if those are exhaustive.

@adamcstephens
Copy link
Contributor

adamcstephens commented Jul 28, 2023

2.2.0-rc3 does indeed officially list 6.4 support.

@@ -13,10 +13,10 @@ callPackage ./generic.nix args {
# check the release notes for compatible kernels
kernelCompatible =
if stdenv'.isx86_64 || removeLinuxDRM
then kernel.kernelOlder "6.4"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also update unstable.

@Mic92
Copy link
Member

Mic92 commented Jul 30, 2023

I bump zfs unstable here: #246163

@Ma27
Copy link
Member

Ma27 commented Jul 31, 2023

Yeah, I understand this sentiment completely. My personal opinion would be to leave the EOL kernel in tree, but mark it as insecure. I don't maintain the kernel package though and I know that suggestion is not favored by many people.

fwiw https://nixos.org/manual/nixos/stable/index.html#sec-linux-zfs
In other words, downgrading the minimum compatible zfs version was the policy the kernel package maintainers (including me) followed in the past already.

I know that this is a debate we had in the past already (see also https://discourse.nixos.org/t/aggressive-kernel-removal-on-eol-in-nixos/23097 for more context) and I can understand the pain considering that I'm also using ZFS pretty heavily, but as a package maintainer of the linux kernels I'm pretty strictly against keeping EOLed kernels in here.

@adamcstephens
Copy link
Contributor

I know that this is a debate we had in the past already

Sorry if this feels like a rehash of previous discussions. To be clear, I completely respect your stance.

Unfortunately, I suspect this will keep coming up without some change to the status quo. This PR seems to be yet another example of the discord caused by the available latestCompatibleLinuxPackages option and the lifecycle of the available kernel.

@RaitoBezarius
Copy link
Member Author

Closing, in favor of #246300.
ZFS stable will probably stay like this until it gets upgraded to 2.1.13 release.

@Shados
Copy link
Member

Shados commented Aug 13, 2023

Silent downgrading is definitely not supposed to break anything by virtue of downgrading will always send you to the LTS kernel at least… Of course, reality is much more annoying than that, but if you are a user who needs a non-LTS kernel with ZFS, you are out of luck except if you buy into insecure kernels.

Yes, this is nice in theory, untrue in reality. For example: the latestCompatibleLinuxPackages change from 6.3 to 6.1 broke my display. I'm using an RDNA 3 card, and their support under 6.1 is pretty rough (needed several workarounds/hacks to get reasonably functional), while they're much better handled in 6.3. The silent part is particularly annoying: if I'd been warned about it, I could have re-instituted said workarounds prior to rebooting, rather than having to waste time debugging the apparently freshly-broken graphical output.

The preposition that using a recent, but EOL kernel is automatically insecure is a bit of an oversimplification. Whether or not there's any real risk in that depends on the user's threat model and the specific set of security patches that are lacking. It does seem clear that using an EOL kernel merits at least a warning, and probably explicit opt-in (via the existing permittedInsecurePackages mechanism, perhaps?), but the current eager removal policy is very aggressive and continues to cause issues for people.

As another option, distributions backporting security patches to EOL kernels isn't unheard of. But that would obviously present a maintenance burden to the kernel package maintainers, so understandable if that route isn't taken.

@RaitoBezarius
Copy link
Member Author

Silent downgrading is definitely not supposed to break anything by virtue of downgrading will always send you to the LTS kernel at least… Of course, reality is much more annoying than that, but if you are a user who needs a non-LTS kernel with ZFS, you are out of luck except if you buy into insecure kernels.

Yes, this is nice in theory, untrue in reality. For example: the latestCompatibleLinuxPackages change from 6.3 to 6.1 broke my display. I'm using an RDNA 3 card, and their support under 6.1 is pretty rough (needed several workarounds/hacks to get reasonably functional), while they're much better handled in 6.3. The silent part is particularly annoying: if I'd been warned about it, I could have re-instituted said workarounds prior to rebooting, rather than having to waste time debugging the apparently freshly-broken graphical output.

I think this is still very true in reality for a lot of folks. As you are using a RDNA3 card, and you seem to be aware about the requirement for running latest stable kernels, I would argue this is probably on us on not saying enough to people to not use ZFS in those situations if they cannot accommodate with this silent downgrade, which would have probably avoided the issue.

Introducing a warning is delicate because we have no way to talk to unstable users in any meaningful way yet except a Discourse post (in which I would have been curious if you had seen the notice).

We could have a state mechanism to inform about silent downgrade at activation time, I would also recommend strongly to people to read nvd / nix diff reports between booted system and running system and write some automation to learn about all potentially unwanted downgrades, e.g. systemd, linux.

Overall, there's a lot to do, but I don't feel like it's a good usage of maintainer energy as we don't offer any form of strong promise on ZFS and latest stable kernels, which, I believe, is something that almost no community Linux distribution offers neither. Anyone is free to drop a solution for their use case which is general enough for this class of problem, of course.

The preposition that using a recent, but EOL kernel is automatically insecure is a bit of an oversimplification. Whether or not there's any real risk in that depends on the user's threat model and the specific set of security patches that are lacking. It does seem clear that using an EOL kernel merits at least a warning, and probably explicit opt-in (via the existing permittedInsecurePackages mechanism, perhaps?), but the current eager removal policy is very aggressive and continues to cause issues for people.

It has been discussed in the post mentioned above. permittedInsecurePackages does not really make sense for this type of package IMHO, a new infrastructure is being merged (NixOS/rfcs#127 / #177272), I would direct all efforts there and reopen discussion to attach problems to EOL kernels potentially, then, anyone can opt in with their handlers and do their thing.

As another option, distributions backporting security patches to EOL kernels isn't unheard of. But that would obviously present a maintenance burden to the kernel package maintainers, so understandable if that route isn't taken.

Yeah, I will say politely, this is a non-starter given our human resources in terms of maintenance, except if some people wants to start an Open Collective to fund a full-time kernel maintainer in nixpkgs for that.
Though, I would prefer to spend that money somewhere.


Anyway, I apologize for your bad experience, I guess I will try to open a PR on documenting better when you should not use ZFS on NixOS and include those cases for now, because I don't see the silent downgrade situation changing soon except if folks want to start paying Nixpkgs ZFS maintainers for doing the dance between: nixpkgs, kernel and upstream OpenZFS, which is highly painful honestly.

@Ma27
Copy link
Member

Ma27 commented Aug 13, 2023

FWIW we should probably stop recommending latestCompatibleLinuxPackages and encourage folks to actually pin their kernel, which is especially sensible when using out-of-tree modules like ZFS that don't receive any support from the kernel. That way, it's at least clear that there's problem before booting into a configuration with broken graphics.

@RaitoBezarius
Copy link
Member Author

FWIW we should probably stop recommending latestCompatibleLinuxPackages and encourage folks to actually pin their kernel, which is especially sensible when using out-of-tree modules like ZFS that don't receive any support from the kernel. That way, it's at least clear that there's problem before booting into a configuration with broken graphics.

The problem of this is that we will still get complaints about folks not understanding why linux_6_3 disappeared, IMHO.

Anyway, in those situations, I feel like no matter what we do, there will be problems and issues, so I feel inclined to minimize the energy spent until there's someone who care enough sending a PR to change things.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin 10.rebuild-linux: 0 This PR does not cause any packages to rebuild on Linux
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants