Fix empty xattr dir causing lockup #4123

tuxoko · 2015-12-18T19:52:18Z

During zfs_rmnode on a xattr dir, if the system crash just after
dmu_free_long_range, we would get empty xattr dir in delete queue. This would
cause blkid=0 be passed into zap_get_leaf_byblk when doing zfs_purgedir during
mount, and would try to do rw_enter on a wrong structure and cause system
lockup.

We fix this by returning ENOENT when blkid is zero in zap_get_leaf_byblk.

Signed-off-by: Chunwei Chen [email protected]

tuxoko · 2015-12-19T01:31:46Z

No this doesn't work.

@behlendorf
Do you know how do we check if an dmu object is empty?

behlendorf · 2015-12-19T01:51:21Z

Right, this is possible because truncate and removal are spread over multiple TXs. Ideally they should all be in one TX to ensure that the objects on the unlinked set are always in a consistent state even if the system were to crash. One remaining question is if the upstream open zfs code has the same flaw.

During zfs_rmnode on a xattr dir, if the system crash just after dmu_free_long_range, we would get empty xattr dir in delete queue. This would cause blkid=0 be passed into zap_get_leaf_byblk when doing zfs_purgedir during mount, and would try to do rw_enter on a wrong structure and cause system lockup. We fix this by returning ENOENT when blkid is zero in zap_get_leaf_byblk. Signed-off-by: Chunwei Chen <[email protected]>

tuxoko · 2015-12-19T02:03:25Z

@behlendorf
I revert back to check blkid in zap_get_leaf_byblk, because it worked as expect. I find it weird that we doesn't seems to be able to check is a dnode is empty or not. It will always assume we have the blkid 0 block, while we could check the allocated size. But I wonder if it would work if we got an embedded blkptr.

tuxoko · 2015-12-19T02:38:18Z

@behlendorf
So I look into dmu_free_long_range_impl, it seems it self is not in one tx. So we might get into a state that the object is freed half way. That might be even harder to detect because the header might be intact.

behlendorf · 2015-12-21T19:06:35Z

@tuxoko nice job getting to the root cause. This definitely looks like an upstream OpenZFS issue, however in practice I bet it very rare on other platforms due to their limited use of file xattrs/forks and the need to crash at exactly the right time. On Linux with SELinux enabled we can easily have xattrs/forks for every file and desktop style usage when systems are more regularly just powered off suddenly.

We need to fix this is two parts. First off, zfs_rmnode() must be updated such that it can crash at any time and guarantee the unlinked set on disk is in a consistent state. This should be straight forward to address by only calling dmu_free_long_range() explicitly on normal files. This code is here to handle the case on massive files where the truncation needs to be split up over multiple transaction groups. In this case it's safe to crash at any time because this is just file data.

ZAPs on the other hand should never be handled in this way, then truncate and unlink must happen in the same TX or you can get a damaged ZAP exactly like the one in this issue. The good news is this is exactly what happens down a few lines zfs_znode_delete(). The truncate, destroy, and removal from the unlinked set are all handled in a single TX. So the fix here should be tiny, just restrict dmu_free_long_range() to normal files.

The second part you already have a patch for. When this does inevitably happen either due to a system not running the latest code or because it happened on another platform there needs to be someway to handle it. Ideally this is something we can check for in zfs_rmnode() now that we understand how it's going to be damaged. We could put the fix in the ZAP code but I think it would be best to avoid that because it doesn't cover all possible cases. For example, a large enough ZAP could only be partially truncated in which case we can't rely on the blkid being zero. Worst case we can make the fix there but I think we can do better.

tuxoko · 2015-12-21T19:34:39Z

@behlendorf
Oh, indeed zfs_znode_delete seems to do that in one transaction. I was originally thinking about removing ZFS_XATTR flag after purgedir. But it makes it much more easy.

We need truncate and remove be in the same tx when doing zfs_rmnode on xattr dir. Otherwise, if we truncate and crash, we'll end up with inconsistent zap object on the delete queue. We do this by skipping dmu_free_long_range and let zfs_znode_delete to do the work. Signed-off-by: Chunwei Chen <[email protected]>

tuxoko · 2015-12-21T23:40:02Z

Updated to only free regular file.
Normal directory shouldn't be a problem, because it's emptied and unlinked, and no one would iterate it. But I guess this is safer.

behlendorf · 2015-12-23T01:09:14Z

@tuxoko personally I think this is a little easier to read too. Any thoughts on how we might check for this more cleanly in zfs_purgedir()? I'm not thrilled with add this kind of special case error handling code to the ZAP code. It doesn't cover all possible causes and it might end up obscuring other coding mistakes.

behlendorf · 2015-12-28T17:53:18Z

Merged as:

f5f087e Make xattr dir truncate and remove in one tx
29572cc Fix empty xattr dir causing lockup

tuxoko mentioned this pull request Dec 18, 2015

ZFS 0.6.5.3 - zpool import after hard reboot causes kernel panic #4114

Closed

tuxoko force-pushed the zglb branch from ec59e8d to a1c14c8 Compare December 19, 2015 01:27

tuxoko changed the title ~~Try to recover from empty xattr dir causing lockup~~ Fix empty xattr dir causing lockup Dec 19, 2015

tuxoko force-pushed the zglb branch from a1c14c8 to 16f7d95 Compare December 19, 2015 01:59

tuxoko force-pushed the zglb branch from cb57b43 to da8264a Compare December 21, 2015 23:37

This was referenced Dec 23, 2015

ZFS breaks on filesystem mount #4052

Closed

Nasty deadlock on zpool import #4006

Closed

behlendorf closed this Dec 28, 2015

tuxoko deleted the zglb branch March 3, 2017 19:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix empty xattr dir causing lockup #4123

Fix empty xattr dir causing lockup #4123

tuxoko commented Dec 18, 2015

tuxoko commented Dec 19, 2015

behlendorf commented Dec 19, 2015

tuxoko commented Dec 19, 2015

tuxoko commented Dec 19, 2015

behlendorf commented Dec 21, 2015

tuxoko commented Dec 21, 2015

tuxoko commented Dec 21, 2015

behlendorf commented Dec 23, 2015

behlendorf commented Dec 28, 2015

Fix empty xattr dir causing lockup #4123

Fix empty xattr dir causing lockup #4123

Conversation

tuxoko commented Dec 18, 2015

tuxoko commented Dec 19, 2015

behlendorf commented Dec 19, 2015

tuxoko commented Dec 19, 2015

tuxoko commented Dec 19, 2015

behlendorf commented Dec 21, 2015

tuxoko commented Dec 21, 2015

tuxoko commented Dec 21, 2015

behlendorf commented Dec 23, 2015

behlendorf commented Dec 28, 2015