-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix race in dnode_check_slots_free() #8005
Conversation
Currently, dnode_check_slots_free() works by checking dn->dn_type in the dnode to determine if the dnode is reclaimable. However, there is a small window of time between dnode_free_sync() in the first call to dsl_dataset_sync() and when the useraccounting code is run when the type is set DMU_OT_NONE, but the dnode is not yet evictable, leading to crashes. This patch adds the ability for dnodes to track which txg they were last dirtied in and adds a check for this before performing the reclaim. This patch also corrects several instances when dn_dirty_link was treated as a list_node_t when it is technically a multilist_node_t. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tom Caputi <[email protected]> Requires-spl: spl-0.7-release Issue openzfs#7147 Issue openzfs#7388 Issue openzfs#7997
Codecov Report
@@ Coverage Diff @@
## zfs-0.7-release #8005 +/- ##
==================================================
+ Coverage 72.49% 72.89% +0.4%
==================================================
Files 289 289
Lines 89767 89780 +13
==================================================
+ Hits 65077 65448 +371
+ Misses 24690 24332 -358 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seem to work for me. Without this patch my glusterfs nodes stop responding within an hour after booting. With this patch no hangs for over a day now. Nice
Closing. Will be included in 0.7.12. |
Don't see this issue in the release notes of 0.7.12 |
This patch was included via commit b32f127 which is included in the 0.7.12 branch. |
@bunder2015 Thanks! |
I still see this happening with Ubuntu 19.04 and the included zfs v0.7.12-1ubuntu5. Is there any known workaround for this?
|
Motivation and Context
Backport of edc1e71 for the 0.7 release branch. See issue #7997.
Description
Currently, dnode_check_slots_free() works by checking dn->dn_type
in the dnode to determine if the dnode is reclaimable. However,
there is a small window of time between dnode_free_sync() in the
first call to dsl_dataset_sync() and when the useraccounting code
is run when the type is set DMU_OT_NONE, but the dnode is not yet
evictable, leading to crashes. This patch adds the ability for
dnodes to track which txg they were last dirtied in and adds a
check for this before performing the reclaim.
This patch also corrects several instances when dn_dirty_link was
treated as a list_node_t when it is technically a multilist_node_t.
How Has This Been Tested?
This same issue was observed in master and fix was integrated in April,
this is a clean backport of that change. I've built it locally but I'm pushing
it to the bots for additional testing.
Types of changes
Checklist:
Signed-off-by
.