Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

6.2.8 kernel revert needed for zfs-2.1.9 dkms #14658

Closed
comrade-meowski opened this issue Mar 23, 2023 · 32 comments
Closed

6.2.8 kernel revert needed for zfs-2.1.9 dkms #14658

comrade-meowski opened this issue Mar 23, 2023 · 32 comments
Labels
Type: Defect Incorrect behavior (e.g. crash, hang)

Comments

@comrade-meowski
Copy link

The latest stable kernel 6.2.8 causes zfs-2.1.9 & zfs-2.1.10-staging to fail during dkms installation of the zfs modules.

The error given in the dkms make.log is:

ERROR: modpost: GPL-incompatible module zfs.ko uses GPL-only symbol 'bio_start_io_acct'
ERROR: modpost: GPL-incompatible module zfs.ko uses GPL-only symbol 'bio_end_io_acct_remapped'

Building 6.2.8 with commit b477180587d81a4dd756361098bd534dfa6bb53d reverted allows zfs dkms to complete normally again. I have only tested with dkms but presumably this will more generally stop zfs building against 6.2.8 at all.

This is the kernel commit:

commit b477180587d81a4dd756361098bd534dfa6bb53d
Author: Yu Kuai <[email protected]>
Date:   Thu Feb 23 17:12:26 2023 +0800

    block: count 'ios' and 'sectors' when io is done for bio-based device
    
    [ Upstream commit 5f27571382ca42daa3e3d40d1b252bf18c2b61d2 ]
    
    While using iostat for raid, I observed very strange 'await'
    occasionally, and turns out it's due to that 'ios' and 'sectors' is
    counted in bdev_start_io_acct(), while 'nsecs' is counted in
    bdev_end_io_acct(). I'm not sure why they are ccounted like that
    but I think this behaviour is obviously wrong because user will get
    wrong disk stats.
    
    Fix the problem by counting 'ios' and 'sectors' when io is done, like
    what rq-based device does.
@comrade-meowski comrade-meowski added the Type: Defect Incorrect behavior (e.g. crash, hang) label Mar 23, 2023
@philmmanjaro
Copy link

That patch got originally added to 6.3-rc3 kernel and backported to 6.2.8

@rincebrain
Copy link
Contributor

rincebrain commented Mar 23, 2023

That's probably fixable, I would expect, based on reading the commit - I would guess that the difference is the signature change is making some kernel compat configure check fall back on something that's a GPL-only export now, and a simple update to that would resolve this...

e: something like 3ace9de (untested on 6.2+, I don't have that tree laying around conveniently)

e2: or more like the above plus 2e9c9b7

@simonbcn
Copy link

Arch Linux
Linux-xanmod 6.2.8

Same problem compiling from source.

@wtosta
Copy link

wtosta commented Mar 23, 2023

I confirm the issue on SlackWare Linux 15 x86_64 current compiling ZFS on 6.2.8 kernel. On Linux kernel 6.2.6 I haven't got any compilation issues.

@ptr1337
Copy link

ptr1337 commented Mar 23, 2023

Same issue here.
Log is in this action:
https://github.com/CachyOS/linux-cachyos/actions/runs/4492407990

@HougeLangley
Copy link

Same issue

https://bugs.gentoo.org/902821

@demmm
Copy link

demmm commented Mar 23, 2023

That's probably fixable, I would expect, based on reading the commit - I would guess that the difference is the signature change is making some kernel compat configure check fall back on something that's a GPL-only export now, and a simple update to that would resolve this...

e: something like 3ace9de (untested on 6.2+, I don't have that tree laying around conveniently)

e2: or more like the above plus 2e9c9b7

Both patches apply cleanly on 6.2.8, but no change in the build error, seems versions not used when applied to 6.2?
See messages:

checking whether blk_cleanup_disk() exists... no
checking whether generic bdev_*_io_acct() are available... no
checking whether generic disk_*_io_acct() are available... no
checking whether generic bio_*_io_acct() are available... yes

according to the patch, second line should be whether 6.2+ bdev_*_io_acct() are available

@rincebrain
Copy link
Contributor

rincebrain commented Mar 23, 2023 via email

@demmm
Copy link

demmm commented Mar 23, 2023

@rincebrain That was it (I thought it was enabled, but ended up being commented out), builds fine now.

rincebrain added a commit to rincebrain/zfs that referenced this issue Mar 24, 2023
Linux 6.3+, and backports from it, changed the signatures
on bdev_io_{start,end}_acct.

Add a case for it.

Fixes: openzfs#14658

Signed-off-by: Rich Ercolani <[email protected]>
rincebrain added a commit to rincebrain/zfs that referenced this issue Mar 24, 2023
Linux 6.3+, and backports from it, changed the signatures
on bdev_io_{start,end}_acct.

Add a case for it.

Fixes: openzfs#14658

Signed-off-by: Rich Ercolani <[email protected]>
rincebrain added a commit to rincebrain/zfs that referenced this issue Mar 24, 2023
Linux 6.3+, and backports from it, changed the signatures
on bdev_io_{start,end}_acct.

Add a case for it.

Fixes: openzfs#14658

Signed-off-by: Rich Ercolani <[email protected]>
@jonhermansen
Copy link

Is it fairly common that Linux removes or prevents access to certain symbols that zfs relies on? I recall this happening in the past, with encryption primitives...

rincebrain added a commit to rincebrain/zfs that referenced this issue Mar 24, 2023
Linux 6.3+, and backports from it, changed the signatures
on bdev_io_{start,end}_acct.

Add a case for it.

Fixes: openzfs#14658

Signed-off-by: Rich Ercolani <[email protected]>
@behlendorf
Copy link
Contributor

The kernel ABI is always a work in progress and it evolves as needed by the upstream kernel developers. This means for each new kernel release the OpenZFS code may need to be updated to accommodate recent kernel changes. It's for this reason OpenZFS includes an extensive set of configure checks to determine what interfaces are available.

Generally speaking it's fairly uncommon for this kind of breaking change to be backported to a released kernel. When it does happen the fix usually involves updating our checks to detect the modified interface. That's in fact what happened in this case. It's much less common for an interface to be unexpectedly removed or converted in some way which makes it unavailable. However, it does happen on occasion and it's something to keep in mind if you're running a more bleeding edge kernel. This is far less likely to occur with any of the enterprise distributions (Ubuntu LTS, RHEL, SUSE, etc).

We'll get this fixed up in the 2.1.10 release.

@rincebrain
Copy link
Contributor

rincebrain commented Mar 24, 2023 via email

@rincebrain
Copy link
Contributor

I just cut #14668, but still don't have a 6.3 tree handy, could someone check that I didn't break it in cleaning up various style nits?

@kode54
Copy link

kode54 commented Mar 26, 2023

Links against 6.2.8, but I want to make sure I build a fresh 6.3 before I test that. The last one I successfully built was rc1, and I don't know if they've changed anything significantly since then. That shouldn't take too long, though.

@ptr1337
Copy link

ptr1337 commented Mar 26, 2023

Links against 6.2.8, but I want to make sure I build a fresh 6.3 before I test that. The last one I successfully built was rc1, and I don't know if they've changed anything significantly since then. That shouldn't take too long, though.

I've tested to compile the above commit against 6.3rc3 and it did not work at all.
inode_owner_or_capable() is failing.

Above commit against 6.2.8 worked fine.

@imrehg
Copy link

imrehg commented Mar 26, 2023

Pretty related to #14555.

@rincebrain
Copy link
Contributor

Not really, no. That's an AArch64-specific breakage that's unrelated except also in a new kernel.

@lunasophia
Copy link

Is there anything I can do to help get this applied and released? I can't boot my machine because of this issue.

@Alanaktion
Copy link

Is there anything I can do to help get this applied and released? I can't boot my machine because of this issue.

Switching to the linux-lts kernel package with zfs-dkms builds fine for now and allows booting normally, as the backported change only affects more recent kernels.

@kode54
Copy link

kode54 commented Mar 26, 2023

Is there anything I can do to help get this applied and released? I can't boot my machine because of this issue.

Switching to the linux-lts kernel package with zfs-dkms builds fine for now and allows booting normally, as the backported change only affects more recent kernels.

Not terribly useful to use LTS if you have bleeding edge hardware like I do.

@lunasophia
Copy link

lunasophia commented Mar 26, 2023

Also not useful if you've upgraded your pools to have features that aren't in the modules in the current zfs-dkms version. This is why I offered to help more directly instead of asking for a workaround. I'm on Arch and have been tracking zfs-dkms-git so I can have an up to date kernel. I can't use an older version, I need a release to be able to boot my machine. Again, can I help accelerate this at all?

@rincebrain
Copy link
Contributor

Apply #14668 locally if you need this fixed more urgently than releases come out?

If you want to run the latest kernel, there's always going to be a timelag between Linux merging breaking changes and them being fixed, in git or in a release branch, in OpenZFS, which is why I usually advise people don't run on a continuously-updating latest Linux kernel versus a known quantity if they need to rely on OpenZFS.

@isopix
Copy link

isopix commented Mar 26, 2023

Also not useful if you've upgraded your pools to have features that aren't in the modules in the current zfs-dkms version. This is why I offered to help more directly instead of asking for a workaround. I'm on Arch and have been tracking zfs-dkms-git so I can have an up to date kernel. I can't use an older version, I need a release to be able to boot my machine. Again, can I help accelerate this at all?

Here it goes:

diff --git a/PKGBUILD b/PKGBUILD
index 6e04436..badc10c 100644
--- a/PKGBUILD
+++ b/PKGBUILD
@@ -10,7 +10,7 @@
 #
 
 pkgname=zfs-dkms-git
-pkgver=2.1.99.r1726.g7883ea2234
+pkgver=2.1.99.r1854.ga05263b7aa
 pkgrel=1
 epoch=2
 pkgdesc='Kernel modules for the Zettabyte File System.'
@@ -19,15 +19,18 @@ url='https://zfsonlinux.org/'
 license=('CDDL')
 groups=('zfs-git')
 makedepends=('git')
-provides=("ZFS-MODULE=${pkgver}" "SPL-MODULE=${pkgver}" "${pkgname%-git}=${pkgver}" 'spl-dkms')
+provides=("ZFS-MODULE=${pkgver}" "SPL-MODULE=${pkgver}" "${pkgname%-git}=${pkgver}" 'spl-dkms' "zfs" "zfs-headers" "spl" "spl-headers")
 conflicts=("${pkgname%-git}" 'spl-dkms')
 replaces=('spl-dkms-git')
 source=("git+https://github.com/openzfs/zfs.git"
-        "0001-only-build-the-module-in-dkms.conf.patch")
+        "0001-only-build-the-module-in-dkms.conf.patch"
+	"zfs_kernel6.2.8+.patch")
 sha256sums=('SKIP'
-            '539f325e56443554f9b87baff33948b91a280ec1daadcb0c636b105252fcd0f5')
+            '539f325e56443554f9b87baff33948b91a280ec1daadcb0c636b105252fcd0f5'
+            'ce62c656444a0456291f6491795cd7629b8f7621a08f8c3bc51dd77788a40a5e')
 b2sums=('SKIP'
-        'a8ab5da81d214e7801f0f8cdf77c076c714a3f17292df15ca35fcf7aef2c4d505348797e3b1da7590ea303ff488490ddba49e6f9e3f8a0bcc975894d51d97c2b')
+        'a8ab5da81d214e7801f0f8cdf77c076c714a3f17292df15ca35fcf7aef2c4d505348797e3b1da7590ea303ff488490ddba49e6f9e3f8a0bcc975894d51d97c2b'
+        '0a91acccebbc33631343b627c8d0c5ce339f4a725682c62323e190ad560f4ffaeeebcaa79e93754dbd325133994922d1aab5add70a64bf0bc1ac1b6cf48d7b51')
 
 pkgver() {
     cd zfs
@@ -39,6 +42,7 @@ prepare() {
     cd zfs
 
     patch -p1 -i ../0001-only-build-the-module-in-dkms.conf.patch
+    patch -p1 -i ../zfs_kernel6.2.8+.patch
 
     # remove unneeded sections from module build
     sed -ri "/AC_CONFIG_FILES/,/]\)/{

for "zfs_kernel6.2.8+.patch" it's there:
https://903117.bugs.gentoo.org/attachment.cgi?id=859043
from there:
https://bugs.gentoo.org/903117

@admnd
Copy link

admnd commented Mar 26, 2023

Apply #14668 locally if you need this fixed more urgently than releases come out?

Applied onto ZFS 2.1.9 and works wonderfully well here even with Linux 6.2.8. Good job @rincebrain.

The aforementioned Gentoo bug #903117 is nothing more than changes brought in by PR #14668 put together in a single patch file.

@isopix
Copy link

isopix commented Mar 26, 2023

Apply #14668 locally if you need this fixed more urgently than releases come out?

Applied onto ZFS 2.1.9 and works wonderfully well here even with Linux 6.2.8. Good job @rincebrain.

The aforementioned Gentoo bug #903117 is nothing more than changes brought in by PR #14668 put together in a single patch file.
you are right. I was thinking it's archzfs github, but its openzfs ;-)

@bsdice
Copy link

bsdice commented Mar 27, 2023

Fellow Arch user here, I also got bit by this. Currently using zfs-dkms on 6.2.8 with reverted b477180587d81a4dd756361098bd534dfa6bb53d. My thanks to @comrade-meowski for identifying the offending commit.

I also use linux and not linux-lts because imho "LTS" is not really stable. Well, "stable" is not stable we just found out. ;-) There's a constant stream of backports introducing regressions, which then nobody figures out. In the tree close to what Torvalds releases, I think more testing is done. Since I need a bunch more kernel patches (debug symbols, wifi, mellanox, etc.) I use a script to build patched kernels through auto-patched PKGBUILDs anyway. Learned this trick to create a reversal patch from a forward patch:

interdiff -q 628-zfs.patch /dev/null > 628-rev-zfs.patch

To add, thanks for the advice to not discuss on LKML or elsewhere. Obliging... Just wish a big distro like Arch would upgrade ZFS to a first class citizen, possibly endorsing zfsbootmenu to boot zfs snapshots straight from a menu out of UEFI. Once people are hooked on snapshots, encryption, compression, CoW, checksums for everything, I'd hope upstream would no longer be able to ignore the only filesystem I trust.

@kode54
Copy link

kode54 commented Mar 27, 2023

With the above linked PR, you don't need to modify the kernel in any way to use 6.2.8.

Also, #14622 is relevant to further impending doom fun for anyone who wants to get involved, beyond the kind person in the issue who reported to be working on it.

@youzhongyang
Copy link
Contributor

Yes, a PR for #14622 will soon be ready for review. Test suite run has passed on 6.3-rc2, now running on a 5.15 kernel.

behlendorf pushed a commit to behlendorf/zfs that referenced this issue Mar 27, 2023
Linux 6.3+, and backports from it (6.2.8+), changed the
signatures on bdev_io_{start,end}_acct.  Add a case for it.  

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Rich Ercolani <[email protected]>
Closes openzfs#14658
Closes openzfs#14668
andrewc12 pushed a commit to andrewc12/openzfs that referenced this issue Mar 27, 2023
Linux 6.3+, and backports from it (6.2.8+), changed the
signatures on bdev_io_{start,end}_acct.  Add a case for it.  

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Rich Ercolani <[email protected]>
Closes openzfs#14658
Closes openzfs#14668
behlendorf pushed a commit that referenced this issue Mar 28, 2023
Linux 6.3+, and backports from it (6.2.8+), changed the
signatures on bdev_io_{start,end}_acct.  Add a case for it.  

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Rich Ercolani <[email protected]>
Closes #14658
Closes #14668
@andr31g
Copy link

andr31g commented Mar 30, 2023

For those who just want a quick rundown of the steps involved, here is what I did (Fedora 37 Workstation):

(Note that the system was previously building the zfs dkms package just fine, and so has all the necessary pre-requisites already installed, such as e.g., kernel-devel, etc)

Thank you for the quick fix @rincebrain!

# dnf install autoconf automake libtool
# cd /usr/src
# wget https://patch-diff.githubusercontent.com/raw/openzfs/zfs/pull/14668.patch
# cd zfs-2.1.9/
# patch -p1 -i ../14668.patch
# ./autogen.sh
# dkms build -m zfs -v 2.1.9
# dkms install -m zfs -v 2.1.9
# modprobe zfs
# modinfo zfs | head
filename:       /lib/modules/6.2.8-200.fc37.x86_64/extra/zfs.ko.xz
version:        2.1.9-1
...
#

HTH

mitchejj pushed a commit to mitchejj/ostree-zfs-kmod that referenced this issue Mar 30, 2023
@simonbcn
Copy link

simonbcn commented Apr 1, 2023

In the title of this error dkms has been added but the problem is general: both using dkms and compiling directly without dkms.

On the other hand, if this is fixed but you have not released a new version with this patch, I don't think the issue should be closed. With the current version 2.1.9 this still fails.

When is it planned to release a version with the patch?

@adroslice
Copy link

For root on zfs users, I highly recommend adding a second kernel and a corresponding bootloader entry to have a fallback for cases exactly like this. It kind of saved me on friday.

tonyhutter pushed a commit to tonyhutter/zfs that referenced this issue Apr 5, 2023
Linux 6.3+, and backports from it (6.2.8+), changed the
signatures on bdev_io_{start,end}_acct.  Add a case for it.  

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Rich Ercolani <[email protected]>
Closes openzfs#14658
Closes openzfs#14668
@castillar
Copy link

My home Fedora server has been bitten by this type of issue a couple times in the past — I know it can take ZFS a bit to catch up with kernel releases, so I've tried to be more careful about checking updates and not just blindly slapping them in.

However, to keep from getting surprised, I've also done this and I recommend it to anyone on Fedora looking to make sure this doesn't bite them:

dnf install 'dnf-command(versionlock)'
dnf versionlock add kernel-core kernel-devel kernel-modules

This will lock the kernel packages to the current version and prevent upgrades or removals. Once I've checked to make sure the next kernel rev is safe, I then run dnf versionlock clear, which removes the locks and allows the upgrade. Then I slap it back in place again to put the safety back on the foot-gun. :)

ProxBot pushed a commit to proxmox/pve-kernel that referenced this issue May 18, 2023
Update the ZFS submodule so that it includes a commit with compat fix
[0] for kernel 6.2.8, which otherwise regressed build through the
484c2be84b49 ("block: count 'ios' and 'sectors' when io is done for
bio-based device") commit, which was backported to stable-6.2 from
the v6.3-rc3 "release".

[0]: openzfs/zfs@59f1875

Link:  openzfs/zfs#14658
Signed-off-by: Thomas Lamprecht <[email protected]>
pcd1193182 pushed a commit to pcd1193182/zfs that referenced this issue Sep 26, 2023
Linux 6.3+, and backports from it (6.2.8+), changed the
signatures on bdev_io_{start,end}_acct.  Add a case for it.  

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Rich Ercolani <[email protected]>
Closes openzfs#14658
Closes openzfs#14668
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Defect Incorrect behavior (e.g. crash, hang)
Projects
None yet
Development

No branches or pull requests