-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix empty xattr dir causing lockup #4123
Conversation
No this doesn't work. @behlendorf |
Right, this is possible because truncate and removal are spread over multiple TXs. Ideally they should all be in one TX to ensure that the objects on the unlinked set are always in a consistent state even if the system were to crash. One remaining question is if the upstream open zfs code has the same flaw. |
During zfs_rmnode on a xattr dir, if the system crash just after dmu_free_long_range, we would get empty xattr dir in delete queue. This would cause blkid=0 be passed into zap_get_leaf_byblk when doing zfs_purgedir during mount, and would try to do rw_enter on a wrong structure and cause system lockup. We fix this by returning ENOENT when blkid is zero in zap_get_leaf_byblk. Signed-off-by: Chunwei Chen <[email protected]>
@behlendorf |
@behlendorf |
@tuxoko nice job getting to the root cause. This definitely looks like an upstream OpenZFS issue, however in practice I bet it very rare on other platforms due to their limited use of file xattrs/forks and the need to crash at exactly the right time. On Linux with SELinux enabled we can easily have xattrs/forks for every file and desktop style usage when systems are more regularly just powered off suddenly. We need to fix this is two parts. First off, ZAPs on the other hand should never be handled in this way, then truncate and unlink must happen in the same TX or you can get a damaged ZAP exactly like the one in this issue. The good news is this is exactly what happens down a few lines The second part you already have a patch for. When this does inevitably happen either due to a system not running the latest code or because it happened on another platform there needs to be someway to handle it. Ideally this is something we can check for in |
@behlendorf |
We need truncate and remove be in the same tx when doing zfs_rmnode on xattr dir. Otherwise, if we truncate and crash, we'll end up with inconsistent zap object on the delete queue. We do this by skipping dmu_free_long_range and let zfs_znode_delete to do the work. Signed-off-by: Chunwei Chen <[email protected]>
Updated to only free regular file. |
@tuxoko personally I think this is a little easier to read too. Any thoughts on how we might check for this more cleanly in |
During zfs_rmnode on a xattr dir, if the system crash just after
dmu_free_long_range, we would get empty xattr dir in delete queue. This would
cause blkid=0 be passed into zap_get_leaf_byblk when doing zfs_purgedir during
mount, and would try to do rw_enter on a wrong structure and cause system
lockup.
We fix this by returning ENOENT when blkid is zero in zap_get_leaf_byblk.
Signed-off-by: Chunwei Chen [email protected]