Zpool split can create a corrupted pool #7865

pzakha · 2018-09-04T19:39:35Z

System information

Type	Version/Name
Distribution Name	Delphix ZFS (Running on Ubuntu 18.04)
Distribution Version	N/A
Linux Kernel	4.15.0-33-generic
Architecture	x86_64
ZFS Version	master (https://github.com/delphix/zfs/commit/25770d93fe64efaeb20c42d0326ef88dece90373)
SPL Version	N/A

Describe the problem you're observing

When calling zpool split on a pool that has a mirror being resilvered, the new pool that is split-off will be corrupted. See original investigation here: #7856 (comment)

Note that this issue also affects OpenZFS (Illumos).

Describe how to reproduce the problem

Create a mirrored pool, offline the second device in the mirror, write data to the pool, online second device, then split the pool before the resilvering has completed. Note that if you offline the first device instead, then the zpool split operation will fail with EBUSY. See link in description for more context.

Steps:

$ sudo zpool create p1 mirror xvdb xvdc
$ sudo zpool offline p1 xvdc
$ sudo dd if=/dev/urandom of=/p1/f1 bs=128k count=40000
$ sudo zpool online p1 xvdc; sleep 1; sudo zpool status p1
  pool: p1
 state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
	continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Tue Sep  4 17:39:22 2018
	4.89G scanned at 1001M/s, 485M issued at 97.1M/s, 4.89G total
	477M resilvered, 9.70% done, 0 days 00:00:46 to go
config:

	NAME        STATE     READ WRITE CKSUM
	p1          ONLINE       0     0     0
	  mirror-0  ONLINE       0     0     0
	    xvdb    ONLINE       0     0     0
	    xvdc    ONLINE       0     0     0

errors: No known data errors
$ sudo zpool split p1 p2
$ sudo zpool status p1
  pool: p1
 state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
	continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Tue Sep  4 17:39:22 2018
	4.89G scanned at 132M/s, 3.97G issued at 107M/s, 4.89G total
	1.86G resilvered, 81.31% done, 0 days 00:00:08 to go
config:

	NAME        STATE     READ WRITE CKSUM
	p1          ONLINE       0     0     0
	  xvdb      ONLINE       0     0     0

errors: No known data errors
$ sudo zpool import p2
$ sudo zpool scrub p2
$ sudo zpool status p2
  pool: p2
 state: DEGRADED
status: One or more devices has experienced an error resulting in data
	corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
	entire pool from backup.
   see: http://zfsonlinux.org/msg/ZFS-8000-8A
  scan: scrub repaired 0B in 0 days 00:01:06 with 24856 errors on Tue Sep  4 17:41:38 2018
config:

	NAME        STATE     READ WRITE CKSUM
	p2          DEGRADED     0     0 24.3K
	  xvdc      DEGRADED     0     0 48.5K  too many errors

errors: 24856 data errors, use '-v' for a list

Include any warning/errors/backtraces from the system logs

The text was updated successfully, but these errors were encountered:

rstrlcpy · 2018-09-07T11:28:09Z

@pzakha what do you think about the following fix for the issue:

diff --git a/usr/src/uts/common/fs/zfs/spa.c b/usr/src/uts/common/fs/zfs/spa.c
index 2466662ac3..c3cb386025 100644
--- a/usr/src/uts/common/fs/zfs/spa.c
+++ b/usr/src/uts/common/fs/zfs/spa.c
@@ -5611,7 +5611,7 @@ spa_vdev_split_mirror(spa_t *spa, char *newname, nvlist_t *config,
                        break;
                }
 
-               if (vdev_dtl_required(vml[c])) {
+               if (vdev_dtl_required(vml[c]) || vdev_resilver_needed(vml[c], NULL, NULL)) {
                        error = SET_ERROR(EBUSY);
                        break;
                }

pzakha · 2018-09-07T15:05:57Z

@ramzec That looks reasonable to me. To be safe we could also check that vdev_dtl_lists are empty for vml[c].

rstrlcpy · 2018-09-07T17:18:57Z

@pzakha vdev_resilver_needed() checks DTL_MISSING, so that as per the following comment from vdev.c there is no reason to check vdev_dtl_lists

For leaf vdevs, DTL_MISSING and DTL_PARTIAL are identical: the device either has the data or it doesn't.

Do you agree?

pzakha · 2018-09-09T16:46:37Z

@ramzec Yeah, I think vdev_resilver_needed() should indeed be sufficient as I believe it will check the DTLs for all txgs, including the un-synced ones.

pzakha mentioned this issue Sep 4, 2018

Panic when running 'zpool split' #7856

Merged

13 tasks

rstrlcpy mentioned this issue Sep 10, 2018

7865 zpool split can create a corrupted pool #7881

Merged

13 tasks

behlendorf closed this as completed in 733b572 Sep 13, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Zpool split can create a corrupted pool #7865

Zpool split can create a corrupted pool #7865

pzakha commented Sep 4, 2018

rstrlcpy commented Sep 7, 2018

pzakha commented Sep 7, 2018

rstrlcpy commented Sep 7, 2018

pzakha commented Sep 9, 2018

Zpool split can create a corrupted pool #7865

Zpool split can create a corrupted pool #7865

Comments

pzakha commented Sep 4, 2018

System information

Describe the problem you're observing

Describe how to reproduce the problem

Include any warning/errors/backtraces from the system logs

rstrlcpy commented Sep 7, 2018

pzakha commented Sep 7, 2018

rstrlcpy commented Sep 7, 2018

pzakha commented Sep 9, 2018