Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix empty xattr dir causing lockup #4123

Closed
wants to merge 2 commits into from
Closed

Conversation

tuxoko
Copy link
Contributor

@tuxoko tuxoko commented Dec 18, 2015

During zfs_rmnode on a xattr dir, if the system crash just after
dmu_free_long_range, we would get empty xattr dir in delete queue. This would
cause blkid=0 be passed into zap_get_leaf_byblk when doing zfs_purgedir during
mount, and would try to do rw_enter on a wrong structure and cause system
lockup.

We fix this by returning ENOENT when blkid is zero in zap_get_leaf_byblk.

Signed-off-by: Chunwei Chen [email protected]

@tuxoko tuxoko changed the title Try to recover from empty xattr dir causing lockup Fix empty xattr dir causing lockup Dec 19, 2015
@tuxoko
Copy link
Contributor Author

tuxoko commented Dec 19, 2015

No this doesn't work.

@behlendorf
Do you know how do we check if an dmu object is empty?

@behlendorf
Copy link
Contributor

Right, this is possible because truncate and removal are spread over multiple TXs. Ideally they should all be in one TX to ensure that the objects on the unlinked set are always in a consistent state even if the system were to crash. One remaining question is if the upstream open zfs code has the same flaw.

During zfs_rmnode on a xattr dir, if the system crash just after
dmu_free_long_range, we would get empty xattr dir in delete queue. This would
cause blkid=0 be passed into zap_get_leaf_byblk when doing zfs_purgedir during
mount, and would try to do rw_enter on a wrong structure and cause system
lockup.

We fix this by returning ENOENT when blkid is zero in zap_get_leaf_byblk.

Signed-off-by: Chunwei Chen <[email protected]>
@tuxoko
Copy link
Contributor Author

tuxoko commented Dec 19, 2015

@behlendorf
I revert back to check blkid in zap_get_leaf_byblk, because it worked as expect. I find it weird that we doesn't seems to be able to check is a dnode is empty or not. It will always assume we have the blkid 0 block, while we could check the allocated size. But I wonder if it would work if we got an embedded blkptr.

@tuxoko
Copy link
Contributor Author

tuxoko commented Dec 19, 2015

@behlendorf
So I look into dmu_free_long_range_impl, it seems it self is not in one tx. So we might get into a state that the object is freed half way. That might be even harder to detect because the header might be intact.

@behlendorf
Copy link
Contributor

@tuxoko nice job getting to the root cause. This definitely looks like an upstream OpenZFS issue, however in practice I bet it very rare on other platforms due to their limited use of file xattrs/forks and the need to crash at exactly the right time. On Linux with SELinux enabled we can easily have xattrs/forks for every file and desktop style usage when systems are more regularly just powered off suddenly.

We need to fix this is two parts. First off, zfs_rmnode() must be updated such that it can crash at any time and guarantee the unlinked set on disk is in a consistent state. This should be straight forward to address by only calling dmu_free_long_range() explicitly on normal files. This code is here to handle the case on massive files where the truncation needs to be split up over multiple transaction groups. In this case it's safe to crash at any time because this is just file data.

ZAPs on the other hand should never be handled in this way, then truncate and unlink must happen in the same TX or you can get a damaged ZAP exactly like the one in this issue. The good news is this is exactly what happens down a few lines zfs_znode_delete(). The truncate, destroy, and removal from the unlinked set are all handled in a single TX. So the fix here should be tiny, just restrict dmu_free_long_range() to normal files.

The second part you already have a patch for. When this does inevitably happen either due to a system not running the latest code or because it happened on another platform there needs to be someway to handle it. Ideally this is something we can check for in zfs_rmnode() now that we understand how it's going to be damaged. We could put the fix in the ZAP code but I think it would be best to avoid that because it doesn't cover all possible cases. For example, a large enough ZAP could only be partially truncated in which case we can't rely on the blkid being zero. Worst case we can make the fix there but I think we can do better.

@tuxoko
Copy link
Contributor Author

tuxoko commented Dec 21, 2015

@behlendorf
Oh, indeed zfs_znode_delete seems to do that in one transaction. I was originally thinking about removing ZFS_XATTR flag after purgedir. But it makes it much more easy.

We need truncate and remove be in the same tx when doing zfs_rmnode on xattr
dir. Otherwise, if we truncate and crash, we'll end up with inconsistent zap
object on the delete queue. We do this by skipping dmu_free_long_range and let
zfs_znode_delete to do the work.

Signed-off-by: Chunwei Chen <[email protected]>
@tuxoko
Copy link
Contributor Author

tuxoko commented Dec 21, 2015

Updated to only free regular file.
Normal directory shouldn't be a problem, because it's emptied and unlinked, and no one would iterate it. But I guess this is safer.

@behlendorf
Copy link
Contributor

@tuxoko personally I think this is a little easier to read too. Any thoughts on how we might check for this more cleanly in zfs_purgedir()? I'm not thrilled with add this kind of special case error handling code to the ZAP code. It doesn't cover all possible causes and it might end up obscuring other coding mistakes.

@behlendorf
Copy link
Contributor

Merged as:

f5f087e Make xattr dir truncate and remove in one tx
29572cc Fix empty xattr dir causing lockup

@behlendorf behlendorf closed this Dec 28, 2015
@tuxoko tuxoko deleted the zglb branch March 3, 2017 19:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants