-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PANIC at dbuf.c:105:dbuf_dest() #3443
Comments
This duplicates #3249. I'll leave this open since there's more information here. |
Bisected the bug to this commit: 4c7b7ee Illumos 5630 - stale bonus buffer in recycled dnode_t leads to data corruption Not surprising since that's where |
@nedbass I suspect the additional I haven't dug too deeply in to this but I'm concerned about the following hunk from
|
I added the ASSERT as you suggested but didn't hit it before hitting the one from this issue. |
I added some debugging to the SPL mutex implementation to get a stack trace from the thread holding the mutex being destroyed. diff --git a/include/sys/mutex.h b/include/sys/mutex.h
index 9b297e9..fec1d68 100644
--- a/include/sys/mutex.h
+++ b/include/sys/mutex.h
@@ -29,6 +29,10 @@
#include <linux/mutex.h>
#include <linux/compiler_compat.h>
+
+#define SPL_MUTEX_MAGIC 0xABABABABABABABAB
+#define SPL_MUTEX_POISON ~(SPL_MUTEX_MAGIC)
+
typedef enum {
MUTEX_DEFAULT = 0,
MUTEX_SPIN = 1,
@@ -39,6 +43,7 @@ typedef struct {
struct mutex m_mutex;
spinlock_t m_lock; /* used for serializing mutex_exit */
kthread_t *m_owner;
+ uint64_t m_magic;
} kmutex_t;
#define MUTEX(mp) (&((mp)->m_mutex))
@@ -75,18 +80,20 @@ spl_mutex_clear_owner(kmutex_t *mp)
__mutex_init(MUTEX(mp), (name) ? (#name) : (#mp), &__key); \
spin_lock_init(&(mp)->m_lock); \
spl_mutex_clear_owner(mp); \
+ (mp)->m_magic = SPL_MUTEX_MAGIC; \
}
#undef mutex_destroy
#define mutex_destroy(mp) \
{ \
- VERIFY3P(mutex_owner(mp), ==, NULL); \
+ (mp)->m_magic = SPL_MUTEX_POISON; \
}
#define mutex_tryenter(mp) \
({ \
int _rc_; \
\
+ VERIFY3P((mp)->m_magic, ==, SPL_MUTEX_MAGIC); \
if ((_rc_ = mutex_trylock(MUTEX(mp))) == 1) \
spl_mutex_set_owner(mp); \
\
@@ -97,6 +104,7 @@ spl_mutex_clear_owner(kmutex_t *mp)
#define mutex_enter_nested(mp, subclass) \
{ \
ASSERT3P(mutex_owner(mp), !=, current); \
+ VERIFY3P((mp)->m_magic, ==, SPL_MUTEX_MAGIC); \
mutex_lock_nested(MUTEX(mp), (subclass)); \
spl_mutex_set_owner(mp); \
}
@@ -104,6 +112,7 @@ spl_mutex_clear_owner(kmutex_t *mp)
#define mutex_enter_nested(mp, subclass) \
{ \
ASSERT3P(mutex_owner(mp), !=, current); \
+ VERIFY3P((mp)->m_magic, ==, SPL_MUTEX_MAGIC); \
mutex_lock(MUTEX(mp)); \
spl_mutex_set_owner(mp); \
}
@@ -133,6 +142,7 @@ spl_mutex_clear_owner(kmutex_t *mp)
#define mutex_exit(mp) \
{ \
spin_lock(&(mp)->m_lock); \
+ VERIFY3P((mp)->m_magic, ==, SPL_MUTEX_MAGIC); \
spl_mutex_clear_owner(mp); \
mutex_unlock(MUTEX(mp)); \
spin_unlock(&(mp)->m_lock); \ Here is the stack trace I got from
|
That stack is tremendously helpful. There's definitely a race in diff --git a/module/zfs/dmu_objset.c b/module/zfs/dmu_objset.c
index ae4e1dd..0f455cd 100644
--- a/module/zfs/dmu_objset.c
+++ b/module/zfs/dmu_objset.c
@@ -1325,7 +1325,7 @@ dmu_objset_userquota_get_ids(dnode_t *dn, boolean_t before
if (before && dn->dn_bonuslen != 0)
data = DN_BONUS(dn->dn_phys);
else if (!before && dn->dn_bonuslen != 0) {
- if (dn->dn_bonus) {
+ if (RW_WRITE_HELD(&dn->dn_struct_rwlock) && dn->dn_bonus) {
db = dn->dn_bonus;
mutex_enter(&db->db_mtx);
data = dmu_objset_userquota_find_data(db, tx); As an aside, I don't see why this issue wouldn't be in illumos as well. |
The function dmu_objset_userquota_get_ids() checks and uses dn->dn_bonus outside of dn_struct_rwlock. If the dnode is beeing freed then the bonus dbuf may be in process of getting evicted. In this case there is a race that may cause dmu_objset_userquota_get_ids() to access the dbuf after it has been destroyed. To prevent this, ensure that we are either holding dn_struct_rwlock or a reference to the bonus dbuf when calling dmu_objset_userquota_get_ids(). Rename dmu_objset_userquota_get_ids() with an _impl suffix and add a wrapper function take a reference on the bonus dbuf (if needed) before calling it. This was done to keep the code changes simple. Secondly, make a small change to dbuf_try_add_ref(). It checks db->db_blkid which may not be safe since it doesn't yet have a hold on the dbuf. Use the blkid argument instead. Signed-off-by: Ned Bass <[email protected]> Issue openzfs#3443
The function dmu_objset_userquota_get_ids() checks and uses dn->dn_bonus outside of dn_struct_rwlock. If the dnode is beeing freed then the bonus dbuf may be in process of getting evicted. In this case there is a race that may cause dmu_objset_userquota_get_ids() to access the dbuf after it has been destroyed. To prevent this, ensure that we are either holding dn_struct_rwlock or a reference to the bonus dbuf when calling dmu_objset_userquota_get_ids(). Secondly, don't check db->bb_blkid in dbuf_try_add_ref(), but use the blkid argument instead. Checking db->db_blkid may be unsafe since we doesn't yet have a hold on the dbuf so it's validity is unknown. Signed-off-by: Ned Bass <[email protected]> Issue openzfs#3443
The function dmu_objset_userquota_get_ids() checks and uses dn->dn_bonus outside of dn_struct_rwlock. If the dnode is beeing freed then the bonus dbuf may be in process of getting evicted. In this case there is a race that may cause dmu_objset_userquota_get_ids() to access the dbuf after it has been destroyed. To prevent this, ensure that we are either holding dn_struct_rwlock or a reference to the bonus dbuf when calling dmu_objset_userquota_get_ids(). Secondly, don't check db->bb_blkid in dbuf_try_add_ref(), but use the blkid argument instead. Checking db->db_blkid may be unsafe since we doesn't yet have a hold on the dbuf so it's validity is unknown. Signed-off-by: Ned Bass <[email protected]> Issue openzfs#3443
The function dmu_objset_userquota_get_ids() checks and uses dn->dn_bonus outside of dn_struct_rwlock. If the dnode is being freed then the bonus dbuf may be in the process of getting evicted. In this case there is a race that may cause dmu_objset_userquota_get_ids() to access the dbuf after it has been destroyed. To prevent this, ensure that when we are using the bonus dbuf we are either holding a reference on it or have taken dn_struct_rwlock. Secondly, don't check db->bb_blkid in dbuf_try_add_ref(), but use the blkid argument instead. Checking db->db_blkid may be unsafe since we doesn't yet have a hold on the dbuf so its validity is unknown. Signed-off-by: Ned Bass <[email protected]> Issue openzfs#3443
The function dmu_objset_userquota_get_ids() checks and uses dn->dn_bonus outside of dn_struct_rwlock. If the dnode is being freed then the bonus dbuf may be in the process of getting evicted. In this case there is a race that may cause dmu_objset_userquota_get_ids() to access the dbuf after it has been destroyed. To prevent this, ensure that when we are using the bonus dbuf we are either holding a reference on it or have taken dn_struct_rwlock. Signed-off-by: Ned Bass <[email protected]> Issue openzfs#3443
- Don't check db->bb_blkid, but use the blkid argument instead. Checking db->db_blkid may be unsafe since we doesn't yet have a hold on the dbuf so its validity is unknown. - Call mutex_exit() on found_db, not db, since it's not certain that they point to the same dbuf, and the mutex was taken on found_db. Signed-off-by: Ned Bass <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #3443
- Don't check db->bb_blkid, but use the blkid argument instead. Checking db->db_blkid may be unsafe since we doesn't yet have a hold on the dbuf so its validity is unknown. - Call mutex_exit() on found_db, not db, since it's not certain that they point to the same dbuf, and the mutex was taken on found_db. Signed-off-by: Ned Bass <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue openzfs#3443
The function dmu_objset_userquota_get_ids() checks and uses dn->dn_bonus outside of dn_struct_rwlock. If the dnode is being freed then the bonus dbuf may be in the process of getting evicted. In this case there is a race that may cause dmu_objset_userquota_get_ids() to access the dbuf after it has been destroyed. To prevent this, ensure that when we are using the bonus dbuf we are either holding a reference on it or have taken dn_struct_rwlock. Signed-off-by: Ned Bass <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes openzfs#3443
Re-opening issue. The proposed fix for this has been reverted because it introduced #3789. |
This reverts commit 5f8e1e8. It was determined that this patch introduced the quota regression described in #3789. Signed-off-by: Tim Chase <[email protected]> Signed-off-by: Ned Bass <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #3443 Issue #3789
https://reviews.csiden.org/r/245/ may be relevant to this bug |
This reverts commit 5f8e1e8. It was determined that this patch introduced the quota regression described in #3789. Signed-off-by: Tim Chase <[email protected]> Signed-off-by: Ned Bass <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #3443 Issue #3789
@scsiguy I haven't yet tried your patch, but I'm no longer able to reproduce this bug under ZoL 0.6.5. I could test it under the older version where this bug does occur, but I don't think that would be particularly useful. First I want to find out which change "fixed" this bug in our tree. |
The bonus buffer associated with a dnode is expected to remain resident until: the dnode is evicted via dnode_buf_pageout(), the dnode is freed in dnode_sync_free(), or the objset containing the dnode is evicted via dmu_objset_evict(). However, since bonus buffers (and DMU buffers in general) can have draining references when these events occur, dbuf_rele_and_unlock() has logic to ensure that once these late references are released affected buffers will be evicted. dbuf_rele_and_unlock() currently checks for a dbuf for an evicting objset via the os->os_evicting flag, and detects buffers for a freed dnode by testing dn->dn_type and dn->dn_free_txg fields. Unfortunately, the free'd dnode test can fire prematurely - anytime after the dnode is scheduled to be freed via dnode_free() until the free is committed in dnode_sync_free(). If all references to the bonus buffer are dropped within this window, the bonus buffer will be evicted and code in dnode_sync() that relies on the bonus buffer will fail. Additionally, the "free'd dnode test" isn't applied to normal buffers (buffers that are not the bonus buffer) and there is no mechanism to guarantee eviction in the dnode_buf_pageout() case (the dnode is not being freed nor is the objset being evicted). Replace the two existing deferred eviction mechanisms with a per-dbuf flag, db_pending_evict. This is set when explicit eviction is requested via either dnode_evict_dbufs() or dnode_evict_bonus(). These actions only occur after it is safe for dnode buffers to be evicted (e.g. the bonus buffer will not be referenced again). uts/common/fs/zfs/sys/dbuf.h: Add comments for boolean fields in dmu_buf_impl_t. Add the db_pending_evict field. uts/common/fs/zfs/sys/dbuf.h: uts/common/fs/zfs/dbuf.c: Rename db_immediate_evict to db_user_immediate_evict to avoid confusion between dbuf user state eviction and deferred eviction of a dbuf. uts/common/fs/zfs/dbuf.c: Consistently use TRUE/FALSE for boolean fields in dmu_buf_impl_t. Simplify pending eviction logic to use the new db_pending_evict flag in all cases. uts/common/fs/zfs/dmu_objset.c: uts/common/fs/zfs/sys/dmu_objset.h: Remove objset_t's os_evicting field. This same functionality is now provided by db_pending_evict. uts/common/fs/zfs/dnode_sync.c: In dnode_evict_dbufs() and dnode_evict_bonus(), mark dbufs with draining references (dbufs that can't be evicted inline) as pending_evict. In dnode_sync_free(), remove ASSERT() that a dnode being free'd has no active dbufs. This is usually the case, but is not guaranteed due to draining references. (e.g. The deadlist for a deleted dataset may still be open if another thread referenced the dataset just before it was freed and the dsl_dataset_t hasn't been released or is still being evicted). openzfs#3865 openzfs#3443 Ported-by: Ned Bass <[email protected]>
The bonus buffer associated with a dnode is expected to remain resident until: the dnode is evicted via dnode_buf_pageout(), the dnode is freed in dnode_sync_free(), or the objset containing the dnode is evicted via dmu_objset_evict(). However, since bonus buffers (and DMU buffers in general) can have draining references when these events occur, dbuf_rele_and_unlock() has logic to ensure that once these late references are released affected buffers will be evicted. dbuf_rele_and_unlock() currently checks for a dbuf for an evicting objset via the os->os_evicting flag, and detects buffers for a freed dnode by testing dn->dn_type and dn->dn_free_txg fields. Unfortunately, the free'd dnode test can fire prematurely - anytime after the dnode is scheduled to be freed via dnode_free() until the free is committed in dnode_sync_free(). If all references to the bonus buffer are dropped within this window, the bonus buffer will be evicted and code in dnode_sync() that relies on the bonus buffer will fail. Additionally, the "free'd dnode test" isn't applied to normal buffers (buffers that are not the bonus buffer) and there is no mechanism to guarantee eviction in the dnode_buf_pageout() case (the dnode is not being freed nor is the objset being evicted). Replace the two existing deferred eviction mechanisms with a per-dbuf flag, db_pending_evict. This is set when explicit eviction is requested via either dnode_evict_dbufs() or dnode_evict_bonus(). These actions only occur after it is safe for dnode buffers to be evicted (e.g. the bonus buffer will not be referenced again). uts/common/fs/zfs/sys/dbuf.h: Add comments for boolean fields in dmu_buf_impl_t. Add the db_pending_evict field. uts/common/fs/zfs/sys/dbuf.h: uts/common/fs/zfs/dbuf.c: Rename db_immediate_evict to db_user_immediate_evict to avoid confusion between dbuf user state eviction and deferred eviction of a dbuf. uts/common/fs/zfs/dbuf.c: Consistently use TRUE/FALSE for boolean fields in dmu_buf_impl_t. Simplify pending eviction logic to use the new db_pending_evict flag in all cases. uts/common/fs/zfs/dmu_objset.c: uts/common/fs/zfs/sys/dmu_objset.h: Remove objset_t's os_evicting field. This same functionality is now provided by db_pending_evict. uts/common/fs/zfs/dnode_sync.c: In dnode_evict_dbufs() and dnode_evict_bonus(), mark dbufs with draining references (dbufs that can't be evicted inline) as pending_evict. In dnode_sync_free(), remove ASSERT() that a dnode being free'd has no active dbufs. This is usually the case, but is not guaranteed due to draining references. (e.g. The deadlist for a deleted dataset may still be open if another thread referenced the dataset just before it was freed and the dsl_dataset_t hasn't been released or is still being evicted). openzfs#3865 openzfs#3443 Ported-by: Ned Bass <[email protected]>
The bonus buffer associated with a dnode is expected to remain resident until: the dnode is evicted via dnode_buf_pageout(), the dnode is freed in dnode_sync_free(), or the objset containing the dnode is evicted via dmu_objset_evict(). However, since bonus buffers (and DMU buffers in general) can have draining references when these events occur, dbuf_rele_and_unlock() has logic to ensure that once these late references are released affected buffers will be evicted. dbuf_rele_and_unlock() currently checks for a dbuf for an evicting objset via the os->os_evicting flag, and detects buffers for a freed dnode by testing dn->dn_type and dn->dn_free_txg fields. Unfortunately, the free'd dnode test can fire prematurely - anytime after the dnode is scheduled to be freed via dnode_free() until the free is committed in dnode_sync_free(). If all references to the bonus buffer are dropped within this window, the bonus buffer will be evicted and code in dnode_sync() that relies on the bonus buffer will fail. Additionally, the "free'd dnode test" isn't applied to normal buffers (buffers that are not the bonus buffer) and there is no mechanism to guarantee eviction in the dnode_buf_pageout() case (the dnode is not being freed nor is the objset being evicted). Replace the two existing deferred eviction mechanisms with a per-dbuf flag, db_pending_evict. This is set when explicit eviction is requested via either dnode_evict_dbufs() or dnode_evict_bonus(). These actions only occur after it is safe for dnode buffers to be evicted (e.g. the bonus buffer will not be referenced again). uts/common/fs/zfs/sys/dbuf.h: Add comments for boolean fields in dmu_buf_impl_t. Add the db_pending_evict field. uts/common/fs/zfs/sys/dbuf.h: uts/common/fs/zfs/dbuf.c: Rename db_immediate_evict to db_user_immediate_evict to avoid confusion between dbuf user state eviction and deferred eviction of a dbuf. uts/common/fs/zfs/dbuf.c: Consistently use TRUE/FALSE for boolean fields in dmu_buf_impl_t. Simplify pending eviction logic to use the new db_pending_evict flag in all cases. uts/common/fs/zfs/dmu_objset.c: uts/common/fs/zfs/sys/dmu_objset.h: Remove objset_t's os_evicting field. This same functionality is now provided by db_pending_evict. uts/common/fs/zfs/dnode_sync.c: In dnode_evict_dbufs() and dnode_evict_bonus(), mark dbufs with draining references (dbufs that can't be evicted inline) as pending_evict. In dnode_sync_free(), remove ASSERT() that a dnode being free'd has no active dbufs. This is usually the case, but is not guaranteed due to draining references. (e.g. The deadlist for a deleted dataset may still be open if another thread referenced the dataset just before it was freed and the dsl_dataset_t hasn't been released or is still being evicted). openzfs#3865 openzfs#3443 Ported-by: Ned Bass <[email protected]>
The bonus buffer associated with a dnode is expected to remain resident until: the dnode is evicted via dnode_buf_pageout(), the dnode is freed in dnode_sync_free(), or the objset containing the dnode is evicted via dmu_objset_evict(). However, since bonus buffers (and DMU buffers in general) can have draining references when these events occur, dbuf_rele_and_unlock() has logic to ensure that once these late references are released affected buffers will be evicted. dbuf_rele_and_unlock() currently checks for a dbuf for an evicting objset via the os->os_evicting flag, and detects buffers for a freed dnode by testing dn->dn_type and dn->dn_free_txg fields. Unfortunately, the free'd dnode test can fire prematurely - anytime after the dnode is scheduled to be freed via dnode_free() until the free is committed in dnode_sync_free(). If all references to the bonus buffer are dropped within this window, the bonus buffer will be evicted and code in dnode_sync() that relies on the bonus buffer will fail. Additionally, the "free'd dnode test" isn't applied to normal buffers (buffers that are not the bonus buffer) and there is no mechanism to guarantee eviction in the dnode_buf_pageout() case (the dnode is not being freed nor is the objset being evicted). Replace the two existing deferred eviction mechanisms with a per-dbuf flag, db_pending_evict. This is set when explicit eviction is requested via either dnode_evict_dbufs() or dnode_evict_bonus(). These actions only occur after it is safe for dnode buffers to be evicted (e.g. the bonus buffer will not be referenced again). uts/common/fs/zfs/sys/dbuf.h: Add comments for boolean fields in dmu_buf_impl_t. Add the db_pending_evict field. uts/common/fs/zfs/sys/dbuf.h: uts/common/fs/zfs/dbuf.c: Rename db_immediate_evict to db_user_immediate_evict to avoid confusion between dbuf user state eviction and deferred eviction of a dbuf. uts/common/fs/zfs/dbuf.c: Consistently use TRUE/FALSE for boolean fields in dmu_buf_impl_t. Simplify pending eviction logic to use the new db_pending_evict flag in all cases. uts/common/fs/zfs/dmu_objset.c: uts/common/fs/zfs/sys/dmu_objset.h: Remove objset_t's os_evicting field. This same functionality is now provided by db_pending_evict. uts/common/fs/zfs/dnode_sync.c: In dnode_evict_dbufs() and dnode_evict_bonus(), mark dbufs with draining references (dbufs that can't be evicted inline) as pending_evict. In dnode_sync_free(), remove ASSERT() that a dnode being free'd has no active dbufs. This is usually the case, but is not guaranteed due to draining references. (e.g. The deadlist for a deleted dataset may still be open if another thread referenced the dataset just before it was freed and the dsl_dataset_t hasn't been released or is still being evicted). openzfs#3865 openzfs#3443 Ported-by: Ned Bass <[email protected]>
The bonus buffer associated with a dnode is expected to remain resident until: the dnode is evicted via dnode_buf_pageout(), the dnode is freed in dnode_sync_free(), or the objset containing the dnode is evicted via dmu_objset_evict(). However, since bonus buffers (and DMU buffers in general) can have draining references when these events occur, dbuf_rele_and_unlock() has logic to ensure that once these late references are released affected buffers will be evicted. dbuf_rele_and_unlock() currently checks for a dbuf for an evicting objset via the os->os_evicting flag, and detects buffers for a freed dnode by testing dn->dn_type and dn->dn_free_txg fields. Unfortunately, the free'd dnode test can fire prematurely - anytime after the dnode is scheduled to be freed via dnode_free() until the free is committed in dnode_sync_free(). If all references to the bonus buffer are dropped within this window, the bonus buffer will be evicted and code in dnode_sync() that relies on the bonus buffer will fail. Additionally, the "free'd dnode test" isn't applied to normal buffers (buffers that are not the bonus buffer) and there is no mechanism to guarantee eviction in the dnode_buf_pageout() case (the dnode is not being freed nor is the objset being evicted). Replace the two existing deferred eviction mechanisms with a per-dbuf flag, db_pending_evict. This is set when explicit eviction is requested via either dnode_evict_dbufs() or dnode_evict_bonus(). These actions only occur after it is safe for dnode buffers to be evicted (e.g. the bonus buffer will not be referenced again). uts/common/fs/zfs/sys/dbuf.h: Add comments for boolean fields in dmu_buf_impl_t. Add the db_pending_evict field. uts/common/fs/zfs/sys/dbuf.h: uts/common/fs/zfs/dbuf.c: Rename db_immediate_evict to db_user_immediate_evict to avoid confusion between dbuf user state eviction and deferred eviction of a dbuf. uts/common/fs/zfs/dbuf.c: Consistently use TRUE/FALSE for boolean fields in dmu_buf_impl_t. Simplify pending eviction logic to use the new db_pending_evict flag in all cases. uts/common/fs/zfs/dmu_objset.c: uts/common/fs/zfs/sys/dmu_objset.h: Remove objset_t's os_evicting field. This same functionality is now provided by db_pending_evict. uts/common/fs/zfs/dnode_sync.c: In dnode_evict_dbufs() and dnode_evict_bonus(), mark dbufs with draining references (dbufs that can't be evicted inline) as pending_evict. In dnode_sync_free(), remove ASSERT() that a dnode being free'd has no active dbufs. This is usually the case, but is not guaranteed due to draining references. (e.g. The deadlist for a deleted dataset may still be open if another thread referenced the dataset just before it was freed and the dsl_dataset_t hasn't been released or is still being evicted). openzfs#3865 openzfs#3443 Ported-by: Ned Bass <[email protected]>
The bonus buffer associated with a dnode is expected to remain resident until: the dnode is evicted via dnode_buf_pageout(), the dnode is freed in dnode_sync_free(), or the objset containing the dnode is evicted via dmu_objset_evict(). However, since bonus buffers (and DMU buffers in general) can have draining references when these events occur, dbuf_rele_and_unlock() has logic to ensure that once these late references are released affected buffers will be evicted. dbuf_rele_and_unlock() currently checks for a dbuf for an evicting objset via the os->os_evicting flag, and detects buffers for a freed dnode by testing dn->dn_type and dn->dn_free_txg fields. Unfortunately, the free'd dnode test can fire prematurely - anytime after the dnode is scheduled to be freed via dnode_free() until the free is committed in dnode_sync_free(). If all references to the bonus buffer are dropped within this window, the bonus buffer will be evicted and code in dnode_sync() that relies on the bonus buffer will fail. Additionally, the "free'd dnode test" isn't applied to normal buffers (buffers that are not the bonus buffer) and there is no mechanism to guarantee eviction in the dnode_buf_pageout() case (the dnode is not being freed nor is the objset being evicted). Replace the two existing deferred eviction mechanisms with a per-dbuf flag, db_pending_evict. This is set when explicit eviction is requested via either dnode_evict_dbufs() or dnode_evict_bonus(). These actions only occur after it is safe for dnode buffers to be evicted (e.g. the bonus buffer will not be referenced again). uts/common/fs/zfs/sys/dbuf.h: Add comments for boolean fields in dmu_buf_impl_t. Add the db_pending_evict field. uts/common/fs/zfs/sys/dbuf.h: uts/common/fs/zfs/dbuf.c: Rename db_immediate_evict to db_user_immediate_evict to avoid confusion between dbuf user state eviction and deferred eviction of a dbuf. uts/common/fs/zfs/dbuf.c: Consistently use TRUE/FALSE for boolean fields in dmu_buf_impl_t. Simplify pending eviction logic to use the new db_pending_evict flag in all cases. uts/common/fs/zfs/dmu_objset.c: uts/common/fs/zfs/sys/dmu_objset.h: Remove objset_t's os_evicting field. This same functionality is now provided by db_pending_evict. uts/common/fs/zfs/dnode_sync.c: In dnode_evict_dbufs() and dnode_evict_bonus(), mark dbufs with draining references (dbufs that can't be evicted inline) as pending_evict. In dnode_sync_free(), remove ASSERT() that a dnode being free'd has no active dbufs. This is usually the case, but is not guaranteed due to draining references. (e.g. The deadlist for a deleted dataset may still be open if another thread referenced the dataset just before it was freed and the dsl_dataset_t hasn't been released or is still being evicted). openzfs#3865 openzfs#3443 Ported-by: Ned Bass <[email protected]>
6267 dn_bonus evicted too early Reviewed by: Richard Yao <[email protected]> Reviewed by: Xin LI <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> Approved by: Richard Lowe <[email protected]> References: https://www.illumos.org/issues/6267 illumos/illumos-gate@d205810 Signed-off-by: Brian Behlendorf <[email protected]> Ported-by: Ned Bass [email protected] Issue openzfs#3865 Issue openzfs#3443
6267 dn_bonus evicted too early Reviewed by: Richard Yao <[email protected]> Reviewed by: Xin LI <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> Approved by: Richard Lowe <[email protected]> References: https://www.illumos.org/issues/6267 illumos/illumos-gate@d205810 Signed-off-by: Brian Behlendorf <[email protected]> Ported-by: Ned Bass <[email protected]> Issue #3865 Issue #3443
Upstream fix f9f5394 applied. Closing. |
Reproduced on master 4 commits past 0.6.4 (7fad629). Also 77 commits past 0.6.4 (65037d9). Process running
unlink()
destroys a dbuf while thetxg_sync
thread still holdsdb->db_mtx
.Reproducer was using 64 concurrent copies of xattrtest https://github.com/nedbass/xattrtest/tree/x. The ASSERT hits reliably during the unlink phase. I used this test script:
Here we see
txg_sync
is the mutex owner:The text was updated successfully, but these errors were encountered: