-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PANIC on umount due to iput taskq #3281
Comments
We need to hold the z_teardown_lock as a writer whilst draining the iput taskq so the taskq isn't being filled up again behind us. These changes revert commit @fd23720 and address the deadlock raised in issue openzfs#1988 by removing ZFS_ENTER from zfs_putpage and zpl_writepages. Also remove a redundant zil_commit from zpl_writepages. Signed-off-by: Chris Dunlop <[email protected]> Closes openzfs#3281
@chrisrd nice analysis. What you're proposing I think will work, and may be a good short term fix. But I think we can, and should, go farther. Let me try and summarize what I'm thinking. So the ZFS_ENTER/ZFS_EXIT macros are there as you know to coordinate a file system unmount. What isn't clear from the code is that this is actually entirely redundant under Linux. Unlike under illumos (and maybe FreeBSD too?) the Linux VFS already makes certain guarantees we can take advantage of. We just aren't. For all the What would be nice is to define ZFS_ENTER/ZFS_EXIT as no-ops. Or perhaps tweak them so they instead take/drop a reference on the super block rather than a lock. That would eliminate the locking issue you've observed, likely be more performance, and more Linuxy. There definitely is one mystery remaining which needs to be explained and that's this exact bug. When entering |
@behlendorf Re: #3282 (comment) "has this patch resolved the issue for you in practice?" It's too early to tell... I haven't been able to provoke the issue in my own testing, I've only seen this issue on the buildbots, when looking at the failures for the Illumos 5056 pull request (#3254). The latest test run there gave a green board but that doesn't mean much given the infrequency of the problem: I've only seen it 3 times in maybe 5 or 6 buildbot runs x 14? builbots == 70+ individual test runs. Given that I've only observed the problem with #3254 it could well be related specifically to that pull request. Not being able to reproduce the problem locally mean I was constrained to static analysis, which led me down this path. (Obviously I didn't delve deep enough into it to realise that, per your other comments, the Linux VFS should be preventing this occurring in the first place.) I don't know if the issue has ever cropped up elsewhere, but google suggests it hasn't been seen in the wild (or at least hasn't been reported). Google doesn't appear to index the buildbot output - is there a way of searching for similar failures in other buildbot runs (i.e. unrelated to that pull request)? This output displays the problem nicely, with some additional debug logging per @4348dc8a: In that case, when we hit ...argh. I've just realised that all the iput queue entries seen in Obviously it would be far easier if I could provoke the problem myself: are there any guidelines on configuring and running the buildbot test suite locally? With respect to your ZFS_ENTER/ZFS_EXIT comments, "more Linuxy" sounds like a good approach, but I'd have to understand what's happening on the Linux side before I could do anything about it. |
We need to wait for the iput taskq twice to empty the z_all_znodes list, the first time through can add the parents of dir-based xattrs to the taskq: iput iput_final evict destroy_inode zpl_inode_destroy zfs_inode_destroy zfs_iput_async(ZTOI(zp->z_xattr_parent)) taskq_dispatch(dsl_pool_iput_taskq..., iput, ...) ...and only twice, because we never have xattrs on xattrs. Signed-off-by: Chris Dunlop <[email protected]> Closes openzfs#3281
Just to clarify...
and:
...are incorrect and a fundamental mischaracterisation of the issue. The issue is not that the iput taskq contains these entries, it's that the With stock It probably means somthing - but I don't know what - that this only happens on a multi-core VM: on a single core VM I haven't seen any entries in the Separately, it looks like the iput taskq can run:
i.e. it seems, if we're iput-ing a (dir-based) xattr, we'll put it's parent on the iput taskq. Should we be waiting for the taskq twice? E.g. in
Test script and
|
Ahh, that's it! You've struck on a reasonable root cause. That's right, it is possible to to redispatch to the taskq an iput for the parent inode. This is also exactly the sort of thing the zfs-stress torture tests would expose but would be amazingly unlikely on real systems. I would think a reasonable fix for for this would be to simply block in zfs_sb_teardown in the unmount case until zdb->z_all_znodes drops to zero. |
By the time we're tearing down our superblock the VFS has started releasing all our inodes/znodes. Some of this work may have been handed off to our iput taskq so we need to wait for that work to complete. However the iput from the taskq can itself result in additional work being added to the taskq: dsl_pool_iput_taskq iput iput_final evict destroy_inode zpl_inode_destroy zfs_inode_destroy zfs_iput_async(ZTOI(zp->z_xattr_parent)) taskq_dispatch(dsl_pool_iput_taskq..., iput, ...) Let's wait until all our znodes have been released. Signed-off-by: Chris Dunlop <[email protected]> Closes openzfs#3281
By the time we're tearing down our superblock the VFS has started releasing all our inodes/znodes. Some of this work may have been handed off to our iput taskq so we need to wait for that work to complete. However the iput from the taskq can itself result in additional work being added to the taskq: dsl_pool_iput_taskq iput iput_final evict destroy_inode zpl_inode_destroy zfs_inode_destroy zfs_iput_async(ZTOI(zp->z_xattr_parent)) taskq_dispatch(dsl_pool_iput_taskq..., iput, ...) Let's wait until all our znodes have been released. Signed-off-by: Chris Dunlop <[email protected]> Closes openzfs#3281
@behlendorf Like that? |
By the time we're tearing down our superblock the VFS has started releasing all our inodes/znodes. Some of this work may have been handed off to our iput taskq so we need to wait for that work to complete. However the iput from the taskq can itself result in additional work being added to the taskq: dsl_pool_iput_taskq iput iput_final evict destroy_inode zpl_inode_destroy zfs_inode_destroy zfs_iput_async(ZTOI(zp->z_xattr_parent)) taskq_dispatch(dsl_pool_iput_taskq..., iput, ...) Let's wait until all our znodes have been released. Signed-off-by: Chris Dunlop <[email protected]> Closes openzfs#3281
By the time we're tearing down our superblock the VFS has started releasing all our inodes/znodes. Some of this work may have been handed off to our iput taskq so we need to wait for that work to complete. However the iput from the taskq can itself result in additional work being added to the taskq: dsl_pool_iput_taskq iput iput_final evict destroy_inode zpl_inode_destroy zfs_inode_destroy zfs_iput_async(ZTOI(zp->z_xattr_parent)) taskq_dispatch(dsl_pool_iput_taskq..., iput, ...) Let's wait until all our znodes have been released. Signed-off-by: Chris Dunlop <[email protected]> Closes openzfs#3281
By the time we're tearing down our superblock the VFS has started releasing all our inodes/znodes. Some of this work may have been handed off to our iput taskq so we need to wait for that work to complete. However the iput from the taskq can itself result in additional work being added to the taskq: dsl_pool_iput_taskq iput iput_final evict destroy_inode zpl_inode_destroy zfs_inode_destroy zfs_iput_async(ZTOI(zp->z_xattr_parent)) taskq_dispatch(dsl_pool_iput_taskq..., iput, ...) Let's wait until all our znodes have been released. Signed-off-by: Chris Dunlop <[email protected]> Closes openzfs#3281
By the time we're tearing down our superblock the VFS has started releasing all our inodes/znodes. Some of this work may have been handed off to our iput taskq so we need to wait for that work to complete. However the iput from the taskq can itself result in additional work being added to the taskq: dsl_pool_iput_taskq iput iput_final evict destroy_inode zpl_inode_destroy zfs_inode_destroy zfs_iput_async(ZTOI(zp->z_xattr_parent)) taskq_dispatch(dsl_pool_iput_taskq..., iput, ...) Let's wait until all our znodes have been released. Signed-off-by: Chris Dunlop <[email protected]> Closes openzfs#3281
By the time we're tearing down our superblock the VFS has started releasing all our inodes/znodes. Some of this work may have been handed off to our iput taskq so we need to wait for that work to complete. However the iput from the taskq can itself result in additional work being added to the taskq: dsl_pool_iput_taskq iput iput_final evict destroy_inode zpl_inode_destroy zfs_inode_destroy zfs_iput_async(ZTOI(zp->z_xattr_parent)) taskq_dispatch(dsl_pool_iput_taskq..., iput, ...) Let's wait until all our znodes have been released. Signed-off-by: Chris Dunlop <[email protected]> Closes openzfs#3281
By the time we're tearing down our superblock the VFS has started releasing all our inodes/znodes. Some of this work may have been handed off to our iput taskq so we need to wait for that work to complete. However the iput from the taskq can itself result in additional work being added to the taskq: dsl_pool_iput_taskq iput iput_final evict destroy_inode zpl_inode_destroy zfs_inode_destroy zfs_iput_async(ZTOI(zp->z_xattr_parent)) taskq_dispatch(dsl_pool_iput_taskq..., iput, ...) Let's wait until all our znodes have been released. Signed-off-by: Chris Dunlop <[email protected]> Closes openzfs#3281
By the time we're tearing down our superblock the VFS has started releasing all our inodes/znodes. Some of this work may have been handed off to our iput taskq so we need to wait for that work to complete. However the iput from the taskq can itself result in additional work being added to the taskq: dsl_pool_iput_taskq iput iput_final evict destroy_inode zpl_inode_destroy zfs_inode_destroy zfs_iput_async(ZTOI(zp->z_xattr_parent)) taskq_dispatch(dsl_pool_iput_taskq..., iput, ...) Let's wait until all our znodes have been released. Signed-off-by: Chris Dunlop <[email protected]> Closes openzfs#3281
By the time we're tearing down our superblock the VFS has started releasing all our inodes/znodes. Some of this work may have been handed off to our iput taskq so we need to wait for that work to complete. However the iput from the taskq can itself result in additional work being added to the taskq: dsl_pool_iput_taskq iput iput_final evict destroy_inode zpl_inode_destroy zfs_inode_destroy zfs_iput_async(ZTOI(zp->z_xattr_parent)) taskq_dispatch(dsl_pool_iput_taskq..., iput, ...) Let's wait until all our znodes have been released. Signed-off-by: Chris Dunlop <[email protected]> Closes openzfs#3281
*DEBUG VERSION* By the time we're tearing down our superblock the VFS has started releasing all our inodes/znodes. Some of this work may have been handed off to our iput taskq so we need to wait for that work to complete. However the iput from the taskq can itself result in additional work being added to the taskq: dsl_pool_iput_taskq iput iput_final evict destroy_inode zpl_inode_destroy zfs_inode_destroy zfs_iput_async(ZTOI(zp->z_xattr_parent)) taskq_dispatch(dsl_pool_iput_taskq..., iput, ...) Let's wait until all our znodes have been released. Signed-off-by: Chris Dunlop <[email protected]> Closes openzfs#3281
*DEBUG VERSION* By the time we're tearing down our superblock the VFS has started releasing all our inodes/znodes. Some of this work may have been handed off to our iput taskq so we need to wait for that work to complete. However the iput from the taskq can itself result in additional work being added to the taskq: dsl_pool_iput_taskq iput iput_final evict destroy_inode zpl_inode_destroy zfs_inode_destroy zfs_iput_async(ZTOI(zp->z_xattr_parent)) taskq_dispatch(dsl_pool_iput_taskq..., iput, ...) Let's wait until all our znodes have been released. Signed-off-by: Chris Dunlop <[email protected]> Closes openzfs#3281
*DEBUG VERSION* By the time we're tearing down our superblock the VFS has started releasing all our inodes/znodes. Some of this work may have been handed off to our iput taskq so we need to wait for that work to complete. However the iput from the taskq can itself result in additional work being added to the taskq: dsl_pool_iput_taskq iput iput_final evict destroy_inode zpl_inode_destroy zfs_inode_destroy zfs_iput_async(ZTOI(zp->z_xattr_parent)) taskq_dispatch(dsl_pool_iput_taskq..., iput, ...) Let's wait until all our znodes have been released. Signed-off-by: Chris Dunlop <[email protected]> Closes openzfs#3281
*DEBUG VERSION* By the time we're tearing down our superblock the VFS has started releasing all our inodes/znodes. Some of this work may have been handed off to our iput taskq so we need to wait for that work to complete. However the iput from the taskq can itself result in additional work being added to the taskq: dsl_pool_iput_taskq iput iput_final evict destroy_inode zpl_inode_destroy zfs_inode_destroy zfs_iput_async(ZTOI(zp->z_xattr_parent)) taskq_dispatch(dsl_pool_iput_taskq..., iput, ...) Let's wait until all our znodes have been released. Signed-off-by: Chris Dunlop <[email protected]> Closes openzfs#3281
By the time we're tearing down our superblock the VFS has started releasing all our inodes/znodes. Some of this work may have been handed off to our iput taskq so we need to wait for that work to complete. However the iput from the taskq can itself result in additional work being added to the taskq: dsl_pool_iput_taskq iput iput_final evict destroy_inode zpl_inode_destroy zfs_inode_destroy zfs_iput_async(ZTOI(zp->z_xattr_parent)) taskq_dispatch(dsl_pool_iput_taskq..., iput, ...) Let's wait until all our znodes have been released. Signed-off-by: Chris Dunlop <[email protected]> Closes openzfs#3281
By the time we're tearing down our superblock the VFS has started releasing all our inodes/znodes. Some of this work may have been handed off to our iput taskq so we need to wait for that work to complete. However the iput from the taskq can itself result in additional work being added to the taskq: dsl_pool_iput_taskq iput iput_final evict destroy_inode zpl_inode_destroy zfs_inode_destroy zfs_iput_async(ZTOI(zp->z_xattr_parent)) taskq_dispatch(dsl_pool_iput_taskq..., iput, ...) Let's wait until all our znodes have been released. Signed-off-by: Chris Dunlop <[email protected]> Closes openzfs#3281
By the time we're tearing down our superblock the VFS has started releasing all our inodes/znodes. Some of this work may have been handed off to our iput taskq so we need to wait for that work to complete. However the iput from the taskq can itself result in additional work being added to the taskq: dsl_pool_iput_taskq iput iput_final evict destroy_inode zpl_inode_destroy zfs_inode_destroy zfs_iput_async(ZTOI(zp->z_xattr_parent)) taskq_dispatch(dsl_pool_iput_taskq..., iput, ...) Let's wait until all our znodes have been released. Signed-off-by: Chris Dunlop <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes openzfs#3281
By the time we're tearing down our superblock the VFS has started releasing all our inodes/znodes. Some of this work may have been handed off to our iput taskq so we need to wait for that work to complete. However the iput from the taskq can itself result in additional work being added to the taskq: dsl_pool_iput_taskq iput iput_final evict destroy_inode zpl_inode_destroy zfs_inode_destroy zfs_iput_async(ZTOI(zp->z_xattr_parent)) taskq_dispatch(dsl_pool_iput_taskq..., iput, ...) Let's wait until all our znodes have been released. Signed-off-by: Chris Dunlop <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes openzfs#3281
By the time we're tearing down our superblock the VFS has started releasing all our inodes/znodes. Some of this work may have been handed off to our iput taskq so we need to wait for that work to complete. However the iput from the taskq can itself result in additional work being added to the taskq: dsl_pool_iput_taskq iput iput_final evict destroy_inode zpl_inode_destroy zfs_inode_destroy zfs_iput_async(ZTOI(zp->z_xattr_parent)) taskq_dispatch(dsl_pool_iput_taskq..., iput, ...) Let's wait until all our znodes have been released. Signed-off-by: Chris Dunlop <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #3281
Updates ZFS and SPL to latest maintence version. Includes the following: Bug Fixes: * Fix panic due to corrupt nvlist when running utilities (openzfs/zfs#3335) * Fix hard lockup due to infinite loop in zfs_zget() (openzfs/zfs#3349) * Fix panic on unmount due to iput taskq (openzfs/zfs#3281) * Improve metadata shrinker performance on pre-3.1 kernels (openzfs/zfs#3501) * Linux 4.1 compat: use read_iter() / write_iter() * Linux 3.12 compat: NUMA-aware per-superblock shrinker * Fix spurious hung task watchdog stack traces (openzfs/zfs#3402) * Fix module loading in zfs import systemd service (openzfs/zfs#3440) * Fix intermittent libzfs_init() failure to open /dev/zfs (openzfs/zfs#2556) Signed-off-by: Nathaniel Clark <[email protected]> Change-Id: I053087317ff9e5bedc1671bb46062e96bfe6f074 Reviewed-on: http://review.whamcloud.com/15481 Reviewed-by: Alex Zhuravlev <[email protected]> Tested-by: Jenkins Reviewed-by: Isaac Huang <[email protected]> Tested-by: Maloo <[email protected]> Reviewed-by: Oleg Drokin <[email protected]>
We're occasionally seeing stack traces like:
E.g.:
http://buildbot.zfsonlinux.org/builders/centos-7.0-x86_64-builder/builds/1817/steps/shell_17/logs/stdio
http://buildbot.zfsonlinux.org/builders/fedora-18-x86_64-builder/builds/2996/steps/shell_17/logs/stdio
http://buildbot.zfsonlinux.org/builders/fedora-21-x86_64-builder/builds/920/steps/shell_17/logs/stdio
Commit @fd23720 (with my own Signed-off-by!) resolved a deadlock (#1988) by moving draining the iput taskq outside
z_teardown_lock
:With the drain outside this lock there's nothing to prevent further work being submitted between the
taskq_wait
and thez_teardown_lock
being obtained, leading to the PANIC above if the taskq isn't empty by the time we're freeing resources inzfs_sb_free
.It seems the correct solution is to move the drain back inside
z_teardown_lock
and resolve the original deadlock some other way...The
z_teardown_lock
is used in ZFS_ENTER/ZFS_EXIT to prevent further processes submitting work to ZFS whilst we're unmounting the filesystem. The only timez_teardown_lock
is held for write (i.e. blocking ZFS_ENTER) is inzfs_sb_teardown
above.The original deadlock was caused by
dsl_pool_iput_taskq
callingiput
:The ZFS_ENTER in
zfs_putpage
was added in @4c837f0d, with the comment:I.e. it seems there's no actual need for ZFS_ENTER in
zfs_putpage
.zpl_writepages
was added in @119a394:It's not clear to me why the first ZFS_ENTER instance is there. Perhaps to protect against
z_os
disappearing? Howeverzpl_writepages
is either being called before we start to unmount or it's being called as part of the final cleanup before we unmount, i.e.z_os
is always valid. I think this first ZFS_ENTER can be safely removed.We can see from the
iput
call chain above,zpl_writepages
is called withwbc->sync_mode == WB_SYNC_ALL
. This means we get into theif (sync_mode != wbc->sync_mode)
section ofzpl_writepages
where we hit the second ZFS_ENTER protecting thezil_commit
. However following this, we change the sync mode back to WB_SYNC_ALL and callwrite_cache_pages
again, which also does azil_commit
(via thezfs_putpage
where we've also removing ZFS_ENTER):I.e. I think the second
ZFS_ENTER
, and thezil_commit
it's protecting, can be removed.Overall, ZFS_ENTER is used to protect against further activity being added to our file system whilst we're tearing it down. It shouldn't be used to prevent "in progress" work being completed, e.g. that being finished off by the iput taskq.
At least that's what I think. It would be good to get some other eyes on this!
The text was updated successfully, but these errors were encountered: