-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Linux 4.1 oops with loop devices on ZFS #3511
Comments
I've seen this in the test suite as well, but never had the time to investigate it. |
@l1k thanks for the detailed issue report and particularly the reproducer. Once someone has time to look carefully in to this that should make it much easier to determine what's wrong. I briefly looked at the stacks and on the surface there's nothing special going on here. Just normal IO, and very little of the stack actually implicates ZFS. Still it needs to be explained. |
This merge from 4.1-rc6 may be relevant: torvalds/linux@0f1e5b5 |
I think I found other ways to reproduce this issue... while using docker 1.7 on zfs root I get an oops if I use devmapper or aufs as the backing store, but not zfs or vfs. It's also instantly reproduceable, start the binary and the kernel panics. Kernel 4.1.3. |
Bisected to torvalds/linux@aa4d861 ("block: loop: switch to VFS ITER_BVEC"). Unfortunately the commit message is somewhat terse but the commit changes loop.c to expect the requests enqueued by blk_mq to contain bvec instead of kvec structures. The commit also refactors the code for transforming the data read/written to a loop device, this code is used for encrypted loop devices. However, the loop devices I've tested this with are unencrypted, i.e. they use the So, the commit looks sane, it does work if the file backing the loop device is located on a FAT partition instead of a ZFS dataset, reverting the commit fixes the issue. Hm, are requests backed by kvec if the file backing the loop device is located on a ZFS dataset, and by bvec otherwise? Why? |
@l1k nice job bisecting this to the offending commit. I had a quick look and one thing I noticed is that this patch modifies the code to do I/O via .iter_write() instead of .write(). Now that should work fine but I notice in the stack traces that |
I'll note that the terse commit comment for torvalds/linux@aa4d861 is due to it being part of a large merge commit (torvalds/linux@4fc8adcf). There have also been a bunch of other wads of VFS changes which may bear some looking at (in reverse chronological order): I think it would be worth skimming the changes in each of these commits to see whether they've got anything which would affect ZoL. In fact, I'll note that XFS was most of these (at least the first 4 in the list). In my experience, that's a good place to start for whomever might look into this. In summary, it looks like we've got some more 4.1 compatibility stuff to deal with. |
I'm only occasionally using loop devices but running into a BUG or even lock would be a real show-stopper, so I took a look at the mailing list: Maybe XFS & loop related upstream thread: http://marc.info/?l=linux-mm&m=143745156221454&w=2 [[regression 4.2-rc3] loop: xfstests xfs/073 deadlocked in low memory conditions] The problem merely seems to be triggered by
http://marc.info/?l=linux-mm&m=143745156221454&w=2
Potential fix: http://marc.info/?l=linux-kernel&m=143746915525411&w=2 |
Starting from Linux 4.1, bio_vec will be allowed to pass into filesystem via iter_read/iter_write, so we add a bio_vec field in uio_t to hold it, and use UIO_BVEC in segflg to determine which "vec". Also, to be consistent to newer kernel, we make iovec and bio_vec immutable, and make uio act as an iterator with the new uio_skip field indicating number of bytes to skip in the first segment. Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue openzfs/zfs#3511 Issue openzfs/zfs#3640 Closes #468
Starting from Linux 4.1 allows iov_iter with bio_vec to be passed into iter_read/iter_write. Notably, the loop device will pass bio_vec to backend filesystem. However, current ZFS code assumes iovec without any check, so it will always crash when using loop device. With the restructured uio_t, we can safely pass bio_vec in uio_t with UIO_BVEC set. The uio* functions are modified to handle bio_vec case separately. The const uio_iov causes some warning in xuio related stuff, so explicit convert them to non const. Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Richard Yao <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes openzfs#3511 Closes openzfs#3640
Looking forward to this release. I had to downgrade my kernel, zfs-git, zfs-utils-git, and spl-git packages on arch to get my server stable. I was seeing the same null pointer reference as this bug: #3640 Thanks for the hard work guys. |
Mounting a DVD image located on a ZFS dataset revealed a regression with 4.1-rc7. It used to work with 4.0 and it does work with 4.1 if the loop device is located on a plain FAT partition and not on a zpool. The zpool is layered above dm-crypt but I believe that's irrelevant.
To replicate,
dd if=/dev/zero of=testfs bs=1M count=32 && losetup /dev/loop0 testfs
, then wait briefly to get the following hard lockup caused by workqueue itemcache_reap
:Alternatively, issue
mkfs -t ext2 /dev/loop0
immediately afterlosetup
to get a hard lockup atkernfs_fop_write+0xaa
caused by systemd-udevd, plus another one from systemd-udevd, plus another one indsl_dir_tempreserve_clear+0xfd
(the system then continues spewing out oopses every few seconds but console switching is no longer possible, as is writing to the filesystem so I had to grab this with netconsole):Where
kernfs_fop_write+0xaa
looks like this:And
dsl_dir_tempreserve_clear+0xfd
looks like this (it's the invocation oflist_head()
in the while loop condition ofdsl_dir_tempreserve_clear()
, this callslist_empty()
) :The text was updated successfully, but these errors were encountered: