-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Linux 6.11 compatibility #16400
Linux 6.11 compatibility #16400
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @robn, I was able to build this against 6.11rc1 without issue.
One must imagine Sisyphus happy.
😆 😆 😆
When I build this against the zfs-2.2.5 patchset I see memcpy warnings in the dmesg on 6.11.0-rc1:
Example:
The relevant zfs_log.c it is complaining about is this: https://github.com/openzfs/zfs/blob/544b6c1593874eacd73390970cde6c45a5817cfa/module/zfs/zfs_log.c |
Yeah, I saw these too in the ZTS run. As far as I can tell they're part of the kernel FORTIFY stuff; we're I suggest they're probably worth looking into, but separately - they're static analysis warnings, and need to be assessed and fixed or silenced as appropriate. They're not part of this patch set as such. |
d863574
to
b753d5a
Compare
I understand what the warnings are about now. When built with CONFIG_FORTIFY_SOURCE, 6.11 is checking that the destination for memcpy has enough room. The various lr_XX_t structs usually have extra allocation and data after them, but this isn't visible to the compiler, so it reports the target memory regions as too small. I have a couple of prototype patches to address this. Mostly, it's adding flex arrays to structs that have an extra allocation after them, but some rework is required for lr_create_t and lr_rename_t, as Clang doesn't support flex arrays anywhere but at end of struct. There's also some code change required, because so many places do raw pointer math to get past the end of struct, and that doesn't always work out exactly the same with the necessary type changes. Fortunately there's no actual bugs (assuming our existing math is right, which I assume it is), it's just log noise. Still needs fixing to avoid bug reports (and also a little extra rigor doesn't hurt), but no real issue here. Hopefully I can post a patch that isn't too horrible in the next few days. |
I am getting an error when trying to compile with this patch on fedora 39 this was triggered by running on the latest main branch |
I just tried this patch set. I used 6.11rc2, never tried rc1. |
I have had success using this patch with both 6.11.0-rc1 and 6.11.0-rc2 (using my own locally built kernels on x86_64 machines) using Ubuntu 24.04, with my own customized release which is the 2.2.5-staging PR with this PR on top of it: https://launchpad.net/~satadru-umich/+archive/ubuntu/zfs-experimental/ I am not having problems with a shutdown. Reboots are working just fine for me. |
I guess I'll have to try what you did except I'll base it on zfs-2.2.5-hutter. I didn't think to do this because the 6.10 patch set just worked. |
@ascendbeing any interesting console output when this happens? What if you export the pool then unload the zfs module before reboot? I built against -rc2 this morning. My very first bench check run suspended the pool during scrub, but half a dozen runs since have been fine. I don't really trust my bench check for this, it is .. quirky, and this isn't unheard of behaviour for it. But, coincidences and all that. I'm yet to self-review this PR, but I'll do it soon, and review the rc1->rc2 changes at the same time. |
I have 3 pools. My rootfs being one of them. I am using openrc not systemd so if there's any mechanism short of installing systemd to grab the output (I use metalog I think? with stock config basically), I'm willing to try it. What specifically is happening is I issue a poweroff command and then instead of fully shutting down, monitor turns off and won't turn back on, and PC won't turn off, won't respond to magic sysrq (I have it 0x1 enabled), only way to turn it off is hard power off. Doesn't respond to additional button presses, no network connectivity. Edit also yes I've tried waiting a really long time (30+ min. possibly up to several hours) |
@ascendbeing I mean, your system logs the kernel messages somewhere presumably? Not much we can do to debug this without a console. From your description that's getting extremely late in the shutdown process, to the extent that the keyboard handler can't invoke sysrq operations? A serial console might be the go. I'm not saying its not OpenZFS at fault, but it would be extremely low-level, like, it hasn't released some resource somewhere. I can't off the top of my head think what that would be. |
@wmmur I understand, however this PR is building fine for me on 6.11-rc2 and that file on the 6.11 branch has not changed since May. So clearly there's more going on, and its probably the same thing that is causing your 6.10 build to fail too. So, unless you have new information specific to 6.11, lets just keep it over there for the moment, otherwise we'll end up with bits of conversation in both places, which gets confusing. |
I recently added "reboot=efi,warm mce=bootlog" to my boot arguments. Shuts down fine on 6.10. I also have power profiles daemon broken(crashes on startup somehow) if that matters. I have a raspberry 3b if that can be used to do stuff like acquire really late messages etc. or I could connect another PC via 2.5GbE Ethernet if that works. not to pcie or tb. 2.5GbE on crashy machine to 2.5GbE on other rig If I have to I can snapshot the rootfs and install systemd |
@ascendbeing sorry, you've got a lot more going on there than I know anything about, so I'm not really able to help much more. Divide and conquer is the game from here; try to find out if OpenZFS is the cause or something else. (With the root on OpenZFS, if it were me I'd be trying to set up a live USB with this kernel and try to reproduce the situation that way, where there's a reasonable chance I can unload |
Using this patch on top of 2.2.5, things are looking fine for me with kernel 6.11.0-rc3 on both Ubuntu 24.04 and Ubuntu 24.10/dev. |
@robn The crash on shutdown doesn't occur on 6.10. It happens on HEAD and zfs-2.2.5-hutter w/ original version patch when used with 6.11. The original version of this patch doesn't affect shutting down with 6.10. Probably only with rootfs on 6.11 only. I don't have a spare drive to try to install the same kernel config kernel with non rootfs ATM, but I will probably try the new version eventually. against 2.2.5 release probably. 1 chunk doesn't apply to the hutter 2.2.5 fork anymore. |
It gets hairier again in Linux 6.11, so I want some actual theory of operation laid out for next time. Signed-off-by: Rob Norris <[email protected]> Sponsored-by: https://despairlabs.com/sponsor/
In 6.11 struct queue_limits gains a 'features' field, where, among other things, flush and write-cache are enabled. Detect it and use it. Along the way, the blk_queue_set_write_cache() compat wrapper gets a little cleanup. Since both flags are alway set together, its now a single bool. Also the very very ancient version that sets q->flush_flags directly couldn't actually turn it off, so I've fixed that. Not that we use it, but still. Signed-off-by: Rob Norris <[email protected]> Sponsored-by: https://despairlabs.com/sponsor/
It's no longer available directly on the request queue, but its easy to get from the attached disk. Signed-off-by: Rob Norris <[email protected]> Sponsored-by: https://despairlabs.com/sponsor/
Detect it, and use a macro to make sure we always match the prototype. Signed-off-by: Rob Norris <[email protected]> Sponsored-by: https://despairlabs.com/sponsor/
Apply them with with the rest of the settings. Signed-off-by: Rob Norris <[email protected]> Sponsored-by: https://despairlabs.com/sponsor/
These fields are very old, so no detection necessary; we just move them into the limit setup functions. Signed-off-by: Rob Norris <[email protected]> Sponsored-by: https://despairlabs.com/sponsor/
Since the change to folios it has just been a wrapper anyway. Linux has removed their wrapper, so we add one. Signed-off-by: Rob Norris <[email protected]> Sponsored-by: https://despairlabs.com/sponsor/
Signed-off-by: Rob Norris <[email protected]> Sponsored-by: https://despairlabs.com/sponsor/
24803a4
to
50229a1
Compare
@ascendbeing yep, if you can find out anything at all I can look into it, but right now its very much looking like something strange in your particular setup. If its useful, here's this patch series atop the released 2.2.5 tag: https://github.com/robn/zfs/commits/linux-6.11-compat-2.2.5/. I've compiled it but not run it, ymmv. |
@wmmur I already said that your build issue is unrelated to this PR. Please stop. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Thanks for pushing this boulder a little bit father!
In 6.11 struct queue_limits gains a 'features' field, where, among other things, flush and write-cache are enabled. Detect it and use it. Along the way, the blk_queue_set_write_cache() compat wrapper gets a little cleanup. Since both flags are alway set together, its now a single bool. Also the very very ancient version that sets q->flush_flags directly couldn't actually turn it off, so I've fixed that. Not that we use it, but still. Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Rob Norris <[email protected]> Sponsored-by: https://despairlabs.com/sponsor/ Closes #16400
It's no longer available directly on the request queue, but its easy to get from the attached disk. Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Rob Norris <[email protected]> Sponsored-by: https://despairlabs.com/sponsor/ Closes #16400
Detect it, and use a macro to make sure we always match the prototype. Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Rob Norris <[email protected]> Sponsored-by: https://despairlabs.com/sponsor/ Closes #16400
Apply them with with the rest of the settings. Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Rob Norris <[email protected]> Sponsored-by: https://despairlabs.com/sponsor/ Closes #16400
These fields are very old, so no detection necessary; we just move them into the limit setup functions. Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Rob Norris <[email protected]> Sponsored-by: https://despairlabs.com/sponsor/ Closes #16400
Since the change to folios it has just been a wrapper anyway. Linux has removed their wrapper, so we add one. Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Rob Norris <[email protected]> Sponsored-by: https://despairlabs.com/sponsor/ Closes #16400
Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Rob Norris <[email protected]> Sponsored-by: https://despairlabs.com/sponsor/ Closes #16400
It gets hairier again in Linux 6.11, so I want some actual theory of operation laid out for next time. Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Rob Norris <[email protected]> Sponsored-by: https://despairlabs.com/sponsor/ Closes openzfs#16400
In 6.11 struct queue_limits gains a 'features' field, where, among other things, flush and write-cache are enabled. Detect it and use it. Along the way, the blk_queue_set_write_cache() compat wrapper gets a little cleanup. Since both flags are alway set together, its now a single bool. Also the very very ancient version that sets q->flush_flags directly couldn't actually turn it off, so I've fixed that. Not that we use it, but still. Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Rob Norris <[email protected]> Sponsored-by: https://despairlabs.com/sponsor/ Closes openzfs#16400
It's no longer available directly on the request queue, but its easy to get from the attached disk. Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Rob Norris <[email protected]> Sponsored-by: https://despairlabs.com/sponsor/ Closes openzfs#16400
Detect it, and use a macro to make sure we always match the prototype. Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Rob Norris <[email protected]> Sponsored-by: https://despairlabs.com/sponsor/ Closes openzfs#16400
Apply them with with the rest of the settings. Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Rob Norris <[email protected]> Sponsored-by: https://despairlabs.com/sponsor/ Closes openzfs#16400
These fields are very old, so no detection necessary; we just move them into the limit setup functions. Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Rob Norris <[email protected]> Sponsored-by: https://despairlabs.com/sponsor/ Closes openzfs#16400
Since the change to folios it has just been a wrapper anyway. Linux has removed their wrapper, so we add one. Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Rob Norris <[email protected]> Sponsored-by: https://despairlabs.com/sponsor/ Closes openzfs#16400
Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Rob Norris <[email protected]> Sponsored-by: https://despairlabs.com/sponsor/ Closes openzfs#16400
@robn just gonna leave you and all a thank you because I can report (over last few days I've confirmed, w/ final release) that 6.11 reboot hang is gone. My configuration has drifted a good deal, but when I reported about this here, I had tried changing my .config kinda radically, and this time it just works, without any flailing. |
@ascendbeing well I'm glad to hear it! |
Did this miss a META update? I'm seeing
|
There are still some areas which need to be wrapped up for full 6.11 support. We'll bump the META version once everything is integrated and tested with the final 6.11 release kernel. |
Motivation and Context
One must imagine Sisyphus happy.
Closes #16396.
Description
See individual commits.
Note that most of this is applying block queue API changes to zvol. However, since the code is getting very complex, I've done a bit of cleanup in a way that will change when and how some queue settings are applied. That has the potential to break things that already work on older kernels. I doubt any fallout will be significant, but do take care on review and testing.
How Has This Been Tested?
Compiled and passed basic sanity check (create, fio, export, import, scrub, unload) on kernels:
Full ZTS run on 6.11.0-rc1 currently in progress; should be finished by the morning. I'll update this post when its done.
Update: test suite passed (within normal flakiness bars of my setup).
Types of changes
Checklist:
Signed-off-by
.