vdev_mirror: when resilvering, try reading first #12327

nwf · 2021-07-04T14:00:18Z

Motivation and Context

From experience, resilvers can make it most of the way through and die for one reason or another.

Description

Optimistically, then, fling reads at devices being resilvered and respond to checksum mismatches rather than always writing.

As this is a kind of nop-write, and out of an abundnace of caution, restrict this to checksum types strong enough to be considered good enough. (This, notably, excludes fletcher4, which is always used for pool metadata.)

Signed-off-by: Nathaniel Wesley Filardo [email protected]

How Has This Been Tested?

Local use with zhack scrub.

Types of changes

Performance enhancement (non-breaking change which improves efficiency)

Checklist:

My code follows the OpenZFS code style requirements.
I have updated the documentation accordingly.
I have read the contributing document.
I have added tests to cover my changes.
I have run the ZFS Test Suite with this change applied.
All commit messages are properly formatted and contain Signed-off-by.

From experience, resilvers can make it most of the way through and die for one reason or another. Optimistically, then, fling reads at devices being resilvered and respond to checksum mismatches rather than always writing. As this is a kind of nop-write, and out of an abundnace of caution, restrict this to checksum types strong enough to be considered good enough. (This, notably, excludes fletcher4, which is always used for pool metadata.) Signed-off-by: Nathaniel Wesley Filardo <[email protected]>

jumbi77 · 2021-07-04T15:28:13Z

Just referencing an old Illumos / OpenZFS ticket 5532, which covers similar code lines..but not really sure if its applicable...

nwf · 2021-07-04T16:53:23Z

Just referencing an old Illumos / OpenZFS ticket 5532, which covers similar code lines..but not really sure if its applicable...

Looks still applicable to me. Adding a test for mm->children > 1 to the conditional here will cause the mm->children == 1 case to fall through to vdev_mirror_child_select, which will pick the one child (barring badness) and reach the later zio calls in vdev_mirror_io_start, which avoids abd_alloc_sametype, bcopy, and abd_free. In fact it might be nice to reach that case if there's one child left standing after the test introduced by #11930, too.

ahrens

I think this is saying that when we resilver, we read from both sides (old and new), and then write to the new side. Does this also short-circuit the write if the read from the new side checksums correctly? If so, that means that a resilver would read the entire new disk and then also write it, making resilvers go about half as fast? What's the benefit behind this?

ahrens · 2021-07-09T04:08:10Z

module/zfs/vdev_mirror.c

+		    !mm->mm_resilvering;
+		boolean_t resilver_read = have_checksum &&
+		    (zio->io_flags & (ZIO_FLAG_SCRUB | ZIO_FLAG_RESILVER)) &&
+		    (zio_checksum_table[BP_GET_CHECKSUM(zio->io_bp)].ci_flags &


This would be a good place to add a comment explaining the reasoning behind this.

nwf · 2021-07-09T12:10:15Z

For some drive technologies, most notably SMR, writing can be significantly slower than reading and this can significantly improve recovering from aborted resilvers or from the ill effects of pre-#11930 scans of mirrors with offline devices.

That said, it's probably a good idea to request this read-before-write strategy only on some scans and on some pools, so that we don't pay the read-then-write slowdown when we don't expect the new side to already be full of the correct data. That might mean a new ZIO flag that's controlled by the scan logic. I'll flesh the proposal out to include that if you think this change isn't a challenge to data integrity.

jumbi77 · 2021-07-12T19:08:17Z

Just referencing an old Illumos / OpenZFS ticket 5532, which covers similar code lines..but not really sure if its applicable...

Looks still applicable to me. Adding a test for mm->children > 1 to the conditional here will cause the mm->children == 1 case to fall through to vdev_mirror_child_select, which will pick the one child (barring badness) and reach the later zio calls in vdev_mirror_io_start, which avoids abd_alloc_sametype, bcopy, and abd_free. In fact it might be nice to reach that case if there's one child left standing after the test introduced by #11930, too.

Will you then add the mm->mm_children > 1 check ?

jumbi77 · 2021-09-18T21:01:59Z

@nwf Can you may rebase this, resolve the feedback (add requested comment) and optionally add the mm->mm_children > 1 check ? Much thanks in advance and looking forward to get this upstreamed.

@mmaybee Can you may take a look too? Thanks in advance for your review.

jumbi77 · 2022-09-22T13:07:37Z

@nwf Can you may rebase this, resolve the feedback (add requested comment) and optionally add the mm->mm_children > 1 check ? Much thanks in advance and looking forward to get this upstreamed.

@mmaybee Can you may take a look too? Thanks in advance for your review.

Politely ping @nwf. Do you may can rebase this? Much thanks in advance.

amotin · 2024-10-31T01:03:59Z

I think we should address resilver restarts issue via reducing the amount of block pointer we needlessly accumulate in RAM for sorting if the pool is not fragmented, so that we could create checkpoints more often. I don't think it is reasonable to expect during general case resilver (unlike scrub) for the target device to already have all the data. I agree with @ahrens that this would take at least twice more time for new disks.

jumbi77 · 2024-11-01T06:35:56Z

I think we should address resilver restarts issue via reducing the amount of block pointer we needlessly accumulate in RAM for sorting if the pool is not fragmented, so that we could create checkpoints more often. I don't think it is reasonable to expect during general case resilver (unlike scrub) for the target device to already have all the data. I agree with @ahrens that this would take at least twice more time for new disks.

@amotin And is just the one liner mm->mm_children > 1 from your original illumos ticket applyable (any benefit without side effect?)

amotin · 2024-11-01T16:02:23Z

And is just the one liner mm->mm_children > 1 from your original illumos ticket applyable (any benefit without side effect?)

@jumbi77 If you are about avoiding extra memory copy, then it is already handled by #13606 .

behlendorf added the Status: Design Review Needed Architecture or design is under discussion label Jul 6, 2021

ahrens assigned mmaybee Jul 9, 2021

ahrens reviewed Jul 9, 2021

View reviewed changes

behlendorf added Status: Revision Needed Changes are required for the PR to be accepted and removed Status: Design Review Needed Architecture or design is under discussion labels Sep 16, 2022

amotin closed this Oct 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vdev_mirror: when resilvering, try reading first #12327

vdev_mirror: when resilvering, try reading first #12327

nwf commented Jul 4, 2021

jumbi77 commented Jul 4, 2021 •

edited

Loading

nwf commented Jul 4, 2021

ahrens left a comment

ahrens Jul 9, 2021

nwf commented Jul 9, 2021

jumbi77 commented Jul 12, 2021 •

edited

Loading

jumbi77 commented Sep 18, 2021 •

edited

Loading

jumbi77 commented Sep 22, 2022

amotin commented Oct 31, 2024

jumbi77 commented Nov 1, 2024 •

edited

Loading

amotin commented Nov 1, 2024

vdev_mirror: when resilvering, try reading first #12327

vdev_mirror: when resilvering, try reading first #12327

Conversation

nwf commented Jul 4, 2021

Motivation and Context

Description

How Has This Been Tested?

Types of changes

Checklist:

jumbi77 commented Jul 4, 2021 • edited Loading

nwf commented Jul 4, 2021

ahrens left a comment

Choose a reason for hiding this comment

ahrens Jul 9, 2021

Choose a reason for hiding this comment

nwf commented Jul 9, 2021

jumbi77 commented Jul 12, 2021 • edited Loading

jumbi77 commented Sep 18, 2021 • edited Loading

jumbi77 commented Sep 22, 2022

amotin commented Oct 31, 2024

jumbi77 commented Nov 1, 2024 • edited Loading

amotin commented Nov 1, 2024

jumbi77 commented Jul 4, 2021 •

edited

Loading

jumbi77 commented Jul 12, 2021 •

edited

Loading

jumbi77 commented Sep 18, 2021 •

edited

Loading

jumbi77 commented Nov 1, 2024 •

edited

Loading