Skip to content

Commit

Permalink
Fix draid2+2s metadata error on simultaneous 2 drive failures
Browse files Browse the repository at this point in the history
This patch handles the race condition on simultaneous failure of
2 drives, which misses the vdev_rebuild_reset_wanted signal in
vdev_rebuild_thread. We retry to catch this inside the
vdev_rebuild_complete_sync function.

Reviewed-by: Dipak Ghosh <[email protected]>
Reviewed-by: Akash B <[email protected]>
Signed-off-by: Samuel Wycliffe J <[email protected]>
Issue #14041
  • Loading branch information
samwyc committed Oct 18, 2022
1 parent 0aae8a4 commit d69e559
Showing 1 changed file with 13 additions and 0 deletions.
13 changes: 13 additions & 0 deletions module/zfs/vdev_rebuild.c
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@
*
* Copyright (c) 2018, Intel Corporation.
* Copyright (c) 2020 by Lawrence Livermore National Security, LLC.
* Copyright (c) 2022 Hewlett Packard Enterprise Development LP.
*/

#include <sys/vdev_impl.h>
Expand Down Expand Up @@ -134,6 +135,7 @@ static int zfs_rebuild_scrub_enabled = 1;
* For vdev_rebuild_initiate_sync() and vdev_rebuild_reset_sync().
*/
static __attribute__((noreturn)) void vdev_rebuild_thread(void *arg);
static void vdev_rebuild_reset_sync(void *arg, dmu_tx_t *tx);

/*
* Clear the per-vdev rebuild bytes value for a vdev tree.
Expand Down Expand Up @@ -307,6 +309,17 @@ vdev_rebuild_complete_sync(void *arg, dmu_tx_t *tx)
vdev_rebuild_phys_t *vrp = &vr->vr_rebuild_phys;

mutex_enter(&vd->vdev_rebuild_lock);

/*
* Handle a second device failure if it occurs after all rebuild I/O
* has completed but before this sync task has been executed.
*/
if (vd->vdev_rebuild_reset_wanted) {
mutex_exit(&vd->vdev_rebuild_lock);
vdev_rebuild_reset_sync(arg, tx);
return;
}

vrp->vrp_rebuild_state = VDEV_REBUILD_COMPLETE;
vrp->vrp_end_time = gethrestime_sec();

Expand Down

0 comments on commit d69e559

Please sign in to comment.