Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Panic when running 'zpool split' #7856

Merged
merged 1 commit into from
Mar 22, 2019
Merged

Conversation

rstrlcpy
Copy link
Contributor

Motivation and Context

#5565

Description

From an available crash-dump I see that the freed VDEV is accessed, because it is on txg's DTL of synced VDEV, because was transferred by vdev_top_transfer(), that was called by vdev_remove_parent(), that was called by vdev_split().

"detach" does the cleanup, but it seems for "split" this cleanup was not implemented.

How Has This Been Tested?

That is difficult to reproduce the issue, but we have a host, where it can be reproduced very simple:

zpool attach >>> wait for resilvering >>> zpool split

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation (a change to man pages or other documentation)

Checklist:

  • My code follows the ZFS on Linux code style requirements.
  • I have updated the documentation accordingly.
  • I have read the contributing document.
  • I have added tests to cover my changes.
  • All new and existing tests passed.
  • All commit messages are properly formatted and contain Signed-off-by.
  • Change has been approved by a ZFS on Linux member.

Copy link
Contributor

@pzakha pzakha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not fully clear on what happens if some of the split vdevs have DTLs. Would that mean that some top-level vdevs in the split pool will end-up lacking data for some txgs? Shouldn't we instead check that there are no DTLs and fail if there are any?

module/zfs/spa.c Outdated Show resolved Hide resolved
@pzakha
Copy link
Contributor

pzakha commented Sep 4, 2018

We should put this on hold until we investigate a fix for: #7865

@behlendorf behlendorf added the Status: Work in Progress Not yet ready for general review label Sep 7, 2018
@behlendorf
Copy link
Contributor

@ramzec shall we close this out given the 733b572 fix was merged.

@rstrlcpy
Copy link
Contributor Author

I need to double-check. 99% this is another issue.

@behlendorf behlendorf added Status: Inactive Not being actively updated and removed Status: Work in Progress Not yet ready for general review labels Sep 25, 2018
@ahrens ahrens added the Status: Design Review Needed Architecture or design is under discussion label Sep 27, 2018
@lundman
Copy link
Contributor

lundman commented Jan 31, 2019

This fixes the identical problem we are having.

@behlendorf behlendorf added this to the 0.8.0 milestone Jan 31, 2019
@ikozhukhov
Copy link
Contributor

what is status of this update?

@behlendorf
Copy link
Contributor

Ready to be revisited I believe, and in need of reviewers. @ramzec when you get a moment would to mind rebasing this on master.

@behlendorf behlendorf removed this from the 0.8.0 milestone Mar 20, 2019
@behlendorf
Copy link
Contributor

Shouldn't we instead check that there are no DTLs and fail if there are any?

@pzakha good question. By my reading of spa_vdev_split_mirror() we are already doing exactly this. When looping over the vdevs to validate them zpool split will fail if vdev_dtl_required() determines that removing any of the vdevs would result in a DTL outage.

https://github.com/zfsonlinux/zfs/blob/master/module/zfs/spa.c#L6626L6630

Since the split was determined to be safe it does look like the right thing to do is remove the vdev from the other txg's DTL lists. This is the same procedure employed in spa_vdev_detach.

https://github.com/zfsonlinux/zfs/blob/master/module/zfs/spa.c#L6336L6345

Additionally, we should probably be setting vd->vdev_detached and dirtying the DTL so vdev_dtl_sync() will free the vdev's DTL object.

        vd->vdev_detached = B_TRUE;
        vdev_dirty(tvd, VDD_DTL, vd, txg);

@pzakha
Copy link
Contributor

pzakha commented Mar 21, 2019

The original issue that I pointed out was addressed in #7881. This does look like a separate problem, and the fix looks reasonable.

Added missing remove of detachable VDEV from txg's DTL list
to avoid use-after-free for the splitted VDEV

Signed-off-by: Roman Strashkin <[email protected]>
@codecov
Copy link

codecov bot commented Mar 21, 2019

Codecov Report

Merging #7856 into master will increase coverage by 0.02%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #7856      +/-   ##
==========================================
+ Coverage   78.52%   78.54%   +0.02%     
==========================================
  Files         380      380              
  Lines      116324   116327       +3     
==========================================
+ Hits        91344    91371      +27     
+ Misses      24980    24956      -24
Flag Coverage Δ
#kernel 79.11% <100%> (+0.06%) ⬆️
#user 67.03% <0%> (+0.12%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 304d469...d453d7d. Read the comment docs.

@behlendorf behlendorf added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels Mar 21, 2019
@behlendorf behlendorf merged commit 234234c into openzfs:master Mar 22, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Accepted Ready to integrate (reviewed, tested)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants