Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invoke zdb by guid to avoid import errors #15298

Merged
merged 1 commit into from
Sep 22, 2023

Conversation

pcd1193182
Copy link
Contributor

Motivation and Context

Occasionally, zloop fails because zdb can't import the ztest pool for debugging. This may have multiple causes; this PR only fixes one of them.

Description

The problem that was occuring is basically that a device was removed by ztest and replaced with another device. It was then reguided. The import then failed because there were two possible imports with the same name; one with the new guid, and one with the old. This can happen because the label writes from the device removal/replacement can be subject to ztest's error injection. The other ways to fix this would be to change the error injection to not trigger on removals (which may not be technically feasible), or to change the import code to not report configurations that are so short on devices (which would potentially have unpleasant end-user effects when trying to recover from data losses/device configuration issues).

How Has This Been Tested?

zloop runs

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

@behlendorf behlendorf added the Status: Accepted Ready to integrate (reviewed, tested) label Sep 20, 2023
@pcd1193182 pcd1193182 force-pushed the zloop_crash branch 3 times, most recently from ee23a24 to 01dc612 Compare September 21, 2023 20:41
@behlendorf behlendorf merged commit 2e2a46e into openzfs:master Sep 22, 2023
behlendorf pushed a commit to behlendorf/zfs that referenced this pull request Sep 26, 2023
The problem that was occurring is basically that a device was removed 
by ztest and replaced with another device. It was then reguided. The 
import then failed because there were two possible imports with the 
same name; one with the new guid, and one with the old. This can 
happen because the label writes from the device removal/replacement 
can be subject to ztest's error injection. 

The other ways to fix this would be to change the error injection to 
not trigger on removals (which may not be technically feasible), or 
to change the import code to not report configurations that are so 
short on devices (which would potentially have unpleasant end-user 
effects when trying to recover from data losses/device configuration 
issues).

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Matthew Ahrens <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Signed-off-by: Paul Dagnelie <[email protected]>
Closes openzfs#15298
behlendorf pushed a commit that referenced this pull request Sep 28, 2023
The problem that was occurring is basically that a device was removed 
by ztest and replaced with another device. It was then reguided. The 
import then failed because there were two possible imports with the 
same name; one with the new guid, and one with the old. This can 
happen because the label writes from the device removal/replacement 
can be subject to ztest's error injection. 

The other ways to fix this would be to change the error injection to 
not trigger on removals (which may not be technically feasible), or 
to change the import code to not report configurations that are so 
short on devices (which would potentially have unpleasant end-user 
effects when trying to recover from data losses/device configuration 
issues).

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Matthew Ahrens <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Signed-off-by: Paul Dagnelie <[email protected]>
Closes #15298
lundman pushed a commit to openzfsonwindows/openzfs that referenced this pull request Dec 12, 2023
The problem that was occurring is basically that a device was removed 
by ztest and replaced with another device. It was then reguided. The 
import then failed because there were two possible imports with the 
same name; one with the new guid, and one with the old. This can 
happen because the label writes from the device removal/replacement 
can be subject to ztest's error injection. 

The other ways to fix this would be to change the error injection to 
not trigger on removals (which may not be technically feasible), or 
to change the import code to not report configurations that are so 
short on devices (which would potentially have unpleasant end-user 
effects when trying to recover from data losses/device configuration 
issues).

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Matthew Ahrens <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Signed-off-by: Paul Dagnelie <[email protected]>
Closes openzfs#15298
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Accepted Ready to integrate (reviewed, tested)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants