Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix ZED auto-replace for VDEVs using by-id paths #15363

Merged
merged 2 commits into from
Oct 20, 2023

Conversation

don-brady
Copy link
Contributor

Motivation and Context

The ZFS auto-replace mechanism will detect a blank disk inserted for a degraded/faulted VDEV and automatically partition the disk and issue a VDEV replacement.

There are two different /dev/disk/by-xxx paths involved, a by-id and a by-vdev.
In the 'by-vdev' case, the path is both a persistent and physical path.
In the 'by-id' case, the path is persistent and also a unique path.
During an auto-replace, the newly partition disk will have a different 'by-id' name, however the 'by-vdev' name will not change.

A regression in the distant past, changed the ZED auto-replace code such that it is attempting to use the old 'by-id' name during the replacement but that name no longer exists in the 'by-id' namespace. This causes the auto-replace operation to fail.

Note that since the 'by-vdev' names don't change, it's perfectly fine to use the previous VDEV path when dev paths are 'by-vdev'.

Description

The change is simple -- restore the original code so that the VDEV path is updated when using by-id paths.
The more challenging part was to devise a second ZTS test, that would test auto-replace for 'by-id' and help prevent a future regression.

With that new test, we can now do an A|B test with , and without, the fix to confirm that auto-replace for by-id paths works. The existing auto-replace test, functional/fault/auto_replace_001_pos, will confirm that we didn't break auto-replace for 'by-vdev' paths.

In the original functional/fault/auto_replace_001_pos test, the disk wipe (using dd) was not effective in removing the partitioning since the kernel was never informed of the wipe.

  • Added a call to wipefs(8) so that the kernel is informed and ZED will re-partition the device.
  • Added a validation step that the repartitioning occurred by confirming that the GPT partition UUID changes

Sponsored-By: OpenDrives Inc.
Sponsored-By: Klara Inc.

How Has This Been Tested?

ZTS functional/fault tests
Audit ZED logging from the tests to confirm that ZED auto-replace was functioning as expected.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

Sponsored-By: OpenDrives Inc.
Sponsored-By: Klara Inc.

Signed-off-by: Don Brady <[email protected]>
@behlendorf behlendorf added the Component: ZED ZFS Event Daemon label Oct 6, 2023
@don-brady don-brady added the Status: Code Review Needed Ready for review and testing label Oct 6, 2023
@behlendorf behlendorf added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels Oct 20, 2023
@behlendorf behlendorf merged commit f0f330e into openzfs:master Oct 20, 2023
19 checks passed
@don-brady don-brady deleted the zed-auto-replace-by-id branch October 25, 2023 14:49
ixhamza pushed a commit to truenas/zfs that referenced this pull request Nov 20, 2023
The change is simple -- restore the original code so that the VDEV 
path is updated when using by-id paths.  The more challenging part 
was to devise a second ZTS test, that would test auto-replace for 
'by-id' and help prevent a future regression.

With that new test, we can now do an A|B test with , and without, 
the fix to confirm that auto-replace for by-id paths works. The 
existing auto-replace test, functional/fault/auto_replace_001_pos, 
will confirm that we didn't break auto-replace for 'by-vdev' paths.

In the original functional/fault/auto_replace_001_pos test, the disk 
wipe (using dd) was not effective in removing the partitioning since 
the kernel was never informed of the wipe.

Added a call to wipefs(8) so that the kernel is informed and ZED will 
re-partition the device.
    
Added a validation step that the re-partitioning occurred by
confirming  that the GPT partition UUID changes.

Sponsored-By: OpenDrives Inc.
Sponsored-By: Klara Inc.
Reviewed-by: Rob Norris <[email protected]>
Reviewed-by: Tony Hutter <[email protected]>
Signed-off-by: Don Brady <[email protected]>
Closes openzfs#15363
lundman pushed a commit to openzfsonwindows/openzfs that referenced this pull request Dec 12, 2023
The change is simple -- restore the original code so that the VDEV 
path is updated when using by-id paths.  The more challenging part 
was to devise a second ZTS test, that would test auto-replace for 
'by-id' and help prevent a future regression.

With that new test, we can now do an A|B test with , and without, 
the fix to confirm that auto-replace for by-id paths works. The 
existing auto-replace test, functional/fault/auto_replace_001_pos, 
will confirm that we didn't break auto-replace for 'by-vdev' paths.

In the original functional/fault/auto_replace_001_pos test, the disk 
wipe (using dd) was not effective in removing the partitioning since 
the kernel was never informed of the wipe.

Added a call to wipefs(8) so that the kernel is informed and ZED will 
re-partition the device.
    
Added a validation step that the re-partitioning occurred by
confirming  that the GPT partition UUID changes.

Sponsored-By: OpenDrives Inc.
Sponsored-By: Klara Inc.
Reviewed-by: Rob Norris <[email protected]>
Reviewed-by: Tony Hutter <[email protected]>
Signed-off-by: Don Brady <[email protected]>
Closes openzfs#15363
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: ZED ZFS Event Daemon Status: Accepted Ready to integrate (reviewed, tested)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants