Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zpool split can create a corrupted pool #7865

Closed
pzakha opened this issue Sep 4, 2018 · 4 comments
Closed

Zpool split can create a corrupted pool #7865

pzakha opened this issue Sep 4, 2018 · 4 comments

Comments

@pzakha
Copy link
Contributor

pzakha commented Sep 4, 2018

System information

Type Version/Name
Distribution Name Delphix ZFS (Running on Ubuntu 18.04)
Distribution Version N/A
Linux Kernel 4.15.0-33-generic
Architecture x86_64
ZFS Version master (https://github.com/delphix/zfs/commit/25770d93fe64efaeb20c42d0326ef88dece90373)
SPL Version N/A

Describe the problem you're observing

When calling zpool split on a pool that has a mirror being resilvered, the new pool that is split-off will be corrupted. See original investigation here: #7856 (comment)

Note that this issue also affects OpenZFS (Illumos).

Describe how to reproduce the problem

Create a mirrored pool, offline the second device in the mirror, write data to the pool, online second device, then split the pool before the resilvering has completed. Note that if you offline the first device instead, then the zpool split operation will fail with EBUSY. See link in description for more context.

Steps:

$ sudo zpool create p1 mirror xvdb xvdc
$ sudo zpool offline p1 xvdc
$ sudo dd if=/dev/urandom of=/p1/f1 bs=128k count=40000
$ sudo zpool online p1 xvdc; sleep 1; sudo zpool status p1
  pool: p1
 state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
	continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Tue Sep  4 17:39:22 2018
	4.89G scanned at 1001M/s, 485M issued at 97.1M/s, 4.89G total
	477M resilvered, 9.70% done, 0 days 00:00:46 to go
config:

	NAME        STATE     READ WRITE CKSUM
	p1          ONLINE       0     0     0
	  mirror-0  ONLINE       0     0     0
	    xvdb    ONLINE       0     0     0
	    xvdc    ONLINE       0     0     0

errors: No known data errors
$ sudo zpool split p1 p2
$ sudo zpool status p1
  pool: p1
 state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
	continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Tue Sep  4 17:39:22 2018
	4.89G scanned at 132M/s, 3.97G issued at 107M/s, 4.89G total
	1.86G resilvered, 81.31% done, 0 days 00:00:08 to go
config:

	NAME        STATE     READ WRITE CKSUM
	p1          ONLINE       0     0     0
	  xvdb      ONLINE       0     0     0

errors: No known data errors
$ sudo zpool import p2
$ sudo zpool scrub p2
$ sudo zpool status p2
  pool: p2
 state: DEGRADED
status: One or more devices has experienced an error resulting in data
	corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
	entire pool from backup.
   see: http://zfsonlinux.org/msg/ZFS-8000-8A
  scan: scrub repaired 0B in 0 days 00:01:06 with 24856 errors on Tue Sep  4 17:41:38 2018
config:

	NAME        STATE     READ WRITE CKSUM
	p2          DEGRADED     0     0 24.3K
	  xvdc      DEGRADED     0     0 48.5K  too many errors

errors: 24856 data errors, use '-v' for a list

Include any warning/errors/backtraces from the system logs

@rstrlcpy
Copy link
Contributor

rstrlcpy commented Sep 7, 2018

@pzakha what do you think about the following fix for the issue:

diff --git a/usr/src/uts/common/fs/zfs/spa.c b/usr/src/uts/common/fs/zfs/spa.c
index 2466662ac3..c3cb386025 100644
--- a/usr/src/uts/common/fs/zfs/spa.c
+++ b/usr/src/uts/common/fs/zfs/spa.c
@@ -5611,7 +5611,7 @@ spa_vdev_split_mirror(spa_t *spa, char *newname, nvlist_t *config,
                        break;
                }
 
-               if (vdev_dtl_required(vml[c])) {
+               if (vdev_dtl_required(vml[c]) || vdev_resilver_needed(vml[c], NULL, NULL)) {
                        error = SET_ERROR(EBUSY);
                        break;
                }

@pzakha
Copy link
Contributor Author

pzakha commented Sep 7, 2018

@ramzec That looks reasonable to me. To be safe we could also check that vdev_dtl_lists are empty for vml[c].

@rstrlcpy
Copy link
Contributor

rstrlcpy commented Sep 7, 2018

@pzakha vdev_resilver_needed() checks DTL_MISSING, so that as per the following comment from vdev.c there is no reason to check vdev_dtl_lists

For leaf vdevs, DTL_MISSING and DTL_PARTIAL are identical: the device either has the data or it doesn't.

Do you agree?

@pzakha
Copy link
Contributor Author

pzakha commented Sep 9, 2018

@ramzec Yeah, I think vdev_resilver_needed() should indeed be sufficient as I believe it will check the DTLs for all txgs, including the un-synced ones.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants