-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
vdev_mirror: when resilvering, try reading first #12327
Conversation
From experience, resilvers can make it most of the way through and die for one reason or another. Optimistically, then, fling reads at devices being resilvered and respond to checksum mismatches rather than always writing. As this is a kind of nop-write, and out of an abundnace of caution, restrict this to checksum types strong enough to be considered good enough. (This, notably, excludes fletcher4, which is always used for pool metadata.) Signed-off-by: Nathaniel Wesley Filardo <[email protected]>
Just referencing an old Illumos / OpenZFS ticket 5532, which covers similar code lines..but not really sure if its applicable... |
Looks still applicable to me. Adding a test for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is saying that when we resilver, we read from both sides (old and new), and then write to the new side. Does this also short-circuit the write if the read from the new side checksums correctly? If so, that means that a resilver would read the entire new disk and then also write it, making resilvers go about half as fast? What's the benefit behind this?
!mm->mm_resilvering; | ||
boolean_t resilver_read = have_checksum && | ||
(zio->io_flags & (ZIO_FLAG_SCRUB | ZIO_FLAG_RESILVER)) && | ||
(zio_checksum_table[BP_GET_CHECKSUM(zio->io_bp)].ci_flags & |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be a good place to add a comment explaining the reasoning behind this.
For some drive technologies, most notably SMR, writing can be significantly slower than reading and this can significantly improve recovering from aborted resilvers or from the ill effects of pre-#11930 scans of mirrors with offline devices. That said, it's probably a good idea to request this read-before-write strategy only on some scans and on some pools, so that we don't pay the read-then-write slowdown when we don't expect the new side to already be full of the correct data. That might mean a new ZIO flag that's controlled by the scan logic. I'll flesh the proposal out to include that if you think this change isn't a challenge to data integrity. |
Will you then add the |
Politely ping @nwf. Do you may can rebase this? Much thanks in advance. |
I think we should address resilver restarts issue via reducing the amount of block pointer we needlessly accumulate in RAM for sorting if the pool is not fragmented, so that we could create checkpoints more often. I don't think it is reasonable to expect during general case resilver (unlike scrub) for the target device to already have all the data. I agree with @ahrens that this would take at least twice more time for new disks. |
@amotin And is just the one liner |
Motivation and Context
From experience, resilvers can make it most of the way through and die for one reason or another.
Description
Optimistically, then, fling reads at devices being resilvered and respond to checksum mismatches rather than always writing.
As this is a kind of nop-write, and out of an abundnace of caution, restrict this to checksum types strong enough to be considered good enough. (This, notably, excludes fletcher4, which is always used for pool metadata.)
Signed-off-by: Nathaniel Wesley Filardo [email protected]
How Has This Been Tested?
Local use with
zhack scrub
.Types of changes
Checklist:
Signed-off-by
.