Skip to content

Commit

Permalink
Correct missing zil_claim() DTL updates
Browse files Browse the repository at this point in the history
Commit a1d477c accidentally disabled DTL updates for the zil_claim()
case described at the end of vdev_stat_update() by unconditionally
disabling all DTL updates when loading.  This was done to avoid
a deadlock on the vd_dtl_lock when loading the DTLs from disk.

    vdev_dtl_contains <--- Takes vd->vd_dtl_lock
    vdev_mirror_child_missing
    vdev_mirror_io_start
    zio_vdev_io_start
    __zio_execute
    arc_read
    dbuf_issue_final_prefetch
    dbuf_prefetch_impl
    dbuf_prefetch
    dmu_prefetch
    space_map_iterate
    space_map_load_length
    space_map_load
    vdev_dtl_load <--- Takes vd->vd_dtl_lock
    vdev_load
    spa_ld_load_vdev_metadata
    spa_tryimport

The missing DTL updates can be restored by moving the space_map_load()
call outside the vd_dtl_lock.  A private range tree is populated by
reading the space map and then merged in to the DTL_MISSING tree
under the lock.

Furthermore, the SPA_LOAD_NONE check in vdev_dtl_contains() leads to an
additional problem.  Any resilvering which occurs before SPA_LOAD_NONE
is set will incorrectly determine that there's nothing to repair.  This
can result in full redundancy not being restored for some blocks.

Reviewed-by: Matt Ahrens <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes openzfs#11218
  • Loading branch information
behlendorf authored and jsai20 committed Mar 30, 2021
1 parent b28fcb5 commit b330d3b
Showing 1 changed file with 18 additions and 14 deletions.
32 changes: 18 additions & 14 deletions module/zfs/vdev.c
Original file line number Diff line number Diff line change
Expand Up @@ -2651,15 +2651,12 @@ vdev_dtl_contains(vdev_t *vd, vdev_dtl_type_t t, uint64_t txg, uint64_t size)

/*
* While we are loading the pool, the DTLs have not been loaded yet.
* Ignore the DTLs and try all devices. This avoids a recursive
* mutex enter on the vdev_dtl_lock, and also makes us try hard
* when loading the pool (relying on the checksum to ensure that
* we get the right data -- note that we while loading, we are
* only reading the MOS, which is always checksummed).
* This isn't a problem but it can result in devices being tried
* which are known to not have the data. In which case, the import
* is relying on the checksum to ensure that we get the right data.
* Note that while importing we are only reading the MOS, which is
* always checksummed.
*/
if (vd->vdev_spa->spa_load_state != SPA_LOAD_NONE)
return (B_FALSE);

mutex_enter(&vd->vdev_dtl_lock);
if (!range_tree_is_empty(rt))
dirty = range_tree_contains(rt, txg, size);
Expand Down Expand Up @@ -2977,6 +2974,7 @@ vdev_dtl_load(vdev_t *vd)
{
spa_t *spa = vd->vdev_spa;
objset_t *mos = spa->spa_meta_objset;
range_tree_t *rt;
int error = 0;

if (vd->vdev_ops->vdev_op_leaf && vd->vdev_dtl_object != 0) {
Expand All @@ -2988,10 +2986,17 @@ vdev_dtl_load(vdev_t *vd)
return (error);
ASSERT(vd->vdev_dtl_sm != NULL);

mutex_enter(&vd->vdev_dtl_lock);
error = space_map_load(vd->vdev_dtl_sm,
vd->vdev_dtl[DTL_MISSING], SM_ALLOC);
mutex_exit(&vd->vdev_dtl_lock);
rt = range_tree_create(NULL, RANGE_SEG64, NULL, 0, 0);
error = space_map_load(vd->vdev_dtl_sm, rt, SM_ALLOC);
if (error == 0) {
mutex_enter(&vd->vdev_dtl_lock);
range_tree_walk(rt, range_tree_add,
vd->vdev_dtl[DTL_MISSING]);
mutex_exit(&vd->vdev_dtl_lock);
}

range_tree_vacate(rt, NULL, NULL);
range_tree_destroy(rt);

return (error);
}
Expand Down Expand Up @@ -4459,8 +4464,7 @@ vdev_stat_update(zio_t *zio, uint64_t psize)
if (zio->io_vd == NULL && (zio->io_flags & ZIO_FLAG_DONT_PROPAGATE))
return;

if (spa->spa_load_state == SPA_LOAD_NONE &&
type == ZIO_TYPE_WRITE && txg != 0 &&
if (type == ZIO_TYPE_WRITE && txg != 0 &&
(!(flags & ZIO_FLAG_IO_REPAIR) ||
(flags & ZIO_FLAG_SCAN_THREAD) ||
spa->spa_claiming)) {
Expand Down

0 comments on commit b330d3b

Please sign in to comment.