-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rcu_sched stall, then lockup when mounting zfs #1176
Comments
@akorn There's always hope. There are at least three fairly easy things to try.
Obviously we'll want to get this fixed, but in the meanwhile you should be able to recovery using one of the above methods. |
Thanks. The fs contains a backup and makes heavy use of xattrs (due to rsync --fake-super); I don't need to read it (right now :), but it'd be useful to be able to keep updating it. I hope I'll get around to trying the -rc12 version at least. I don't suppose there is a way to roll just this fs back to some earlier version without mounting it, so that the offending operation is not carried out? I have several snapshots... Will give it a try. |
@akorn Two more thoughts.
|
NOTE: This is almost certainly related to commit 53c7411. During normal usage this was safe because xattrs always hold a reference on their parent and the VFS will never prune an inode with a reference. However, during zfs_unlinked_drain() which is called for cleanup during mount they are strictly handled in the order in which they appear in the ZAP. One fairly simply fix might be to adjust zfs_unlinked_drain() such that it fully traverses the unlinked object ZAP populating all the inodes in-core. Then in a second pass it does an iput() on all of then. This should allow the VFS to take the additional parent references when things are being populated. With all the references now in place the VFS should correctly free the inodes in the correct order (xattr children, then parents) during the second pass of iputs(). |
I rolled this fs back to the latest snapshot (without mounting it) and that also allowed me to subsequently mount it. This could happen again after any unclean shutdown, right? Or also after clean shutdowns (where I even export the pool)? Thanks! |
Yes, it could happen if you unmount the file system shortly after unlinking a bunch of xattr files. |
This reverts commit 53c7411 effectively reinstating the asynchronous xattr cleanup code. These Linux changes were reverted because after testing and careful contemplation I was convinced that due to the 89260a1c8851ce05ea04b23606ba438b271d890 commit they were no longer required. Unfortunately, the deadlock described in openzfs#1176 was a case which wasn't considered. At mount zfs_unlinked_drain() can occur which will unlink a list of znodes in effectively a random order which isn't safe. The only reason it was safe to originally revert this change was the we could guarantee that the VFS would always prune the xattr leaves before the parents. Therefore, until we can cleanly resolve this deadlock for all cases we need to keep this change in spite of the xattr unlink performance penalty associated with it. Signed-off-by: Brian Behlendorf <[email protected]> Closes openzfs#1176 Issue openzfs#457
Hi,
I have a specific dataset that is somehow "broken" and will cause a crash if a mount is attempted.
In the kernel log I see the following:
Then, 15 minutes and 1 second later:
At this point, the box locks up and doesn't even respond to ping anymore.
I'm running zfs 0.6.0.91, with kernel 3.7.1. AFAICT there was no specific event that caused this, but I was running an earlier kernel (3.5.7) and somewhat, but not much earlier zfsonlinux (like 0.6.0.89 maybe) when it first happened.
Is there any hope of salvaging this filesystem, or will I have to destroy it?
The text was updated successfully, but these errors were encountered: