Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scrub vacuously completing instantly on pool from OmniOS #5898

Closed
rincebrain opened this issue Mar 17, 2017 · 4 comments
Closed

scrub vacuously completing instantly on pool from OmniOS #5898

rincebrain opened this issue Mar 17, 2017 · 4 comments

Comments

@rincebrain
Copy link
Contributor

Describe the problem you're observing

If you create a pool on OmniOS r151020, then import it on ZoL 0.6.5.9, the pool will appear to function fine and be read/write, but any attempts to scrub will do almost no IO before returning success.

Describe how to reproduce the problem

  • create a 4-disk raidz2 pool on OmniOS r151020
  • write some data to it
  • import on ZoL 0.6.5.9
  • attempt a scrub
  • watch as the scrub completes infinitely quickly without almost any drive activity

Conveniently, I have a set of VDIs from testing this that are suitable.
These are a raidz2 generated on OmniOS r151020:
https://www.dropbox.com/s/77vkxo1q0y7teeu/omnios%20pool%20issue.zip?dl=1
These are a raidz2 generated on Debian Jessie with ZoL 0.6.5.9:
https://www.dropbox.com/s/mo2jv20gnlcqkv0/omnios%20pool%20issue%20jessie.zip?dl=1

(This was originally reported by someone coming into IRC who had made a raidz2 pool on OmniOS, then had to move to a new machine, found OmniOS didn't run on it, so he moved to Linux, then found this issue. I was surprised to find it reproducible so readily.)

My reproduction is based on writing GBs of data to the pool, restarting to be positive that it can't be keeping the pages from the pool in cache, then running scrub and watching the IO, or lack thereof. I included the more empty disk images just for convenience.

@loli10K
Copy link
Contributor

loli10K commented Mar 17, 2017

any attempts to scrub will do almost no IO before returning success.

based on writing GBs of data to the pool

@rincebrain just so we are on the same page, can you reproduce this same issue on a pool filled with GBs of random data? Most of the data contained in the pool you uploaded is just zero-filled files, it's possible the hypervisor is being smart and feeding you zeros at much higher rates you'd be usually able to get.

My limited testing shows that capping the iops on the 4 VDI i'm able to produce more predictable results.

With --total_iops_sec 3 on every virtual disk:

root@debian-8-zfs:~# zpool status testpool
  pool: testpool
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
	attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
	using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://zfsonlinux.org/msg/ZFS-8000-9P
  scan: scrub in progress since Fri Mar 17 16:35:06 2017
	305M scanned out of 2.67G at 623K/s, 1h6m to go
	102K repaired, 11.16% done
config:

	NAME        STATE     READ WRITE CKSUM
	testpool    ONLINE       0     0     0
	  raidz2-0  ONLINE       0     0     0
	    sdb     ONLINE       0     0     0
	    sdc     ONLINE       0     0     0
	    sdd     ONLINE       0     0     0
	    sde     ONLINE       0     0 2.08K  (repairing)

errors: No known data errors
root@debian-8-zfs:~# zpool iostat testpool 1
              capacity     operations     bandwidth 
pool        alloc   free   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
testpool    2.67G  13.2G     66      3  5.49M  70.4K
testpool    2.67G  13.2G     11      0  1.50M      0
testpool    2.67G  13.2G     11      0  1.50M      0
testpool    2.67G  13.2G     11      0  1.50M      0
testpool    2.67G  13.2G     11      0  1.37M      0
testpool    2.67G  13.2G     11      0   288K      0
^C

@tannerdsilva
Copy link

@loli10K I'm experiencing the same issue with my BSD11-created pool. If this is at all helpful, my data is not mostly zeros. (My ticket is referenced above, #6038)

@rincebrain
Copy link
Contributor Author

Drat, I thought I replied to this saying that even if my reproduction was broken, this was a legitimate problem someone was having that I was trying to reproduce.

Of course, here we are with someone on non-vacuous data having this issue. :)

@loli10K
Copy link
Contributor

loli10K commented Apr 21, 2017

@rincebrain since this reproduction is broken can we close this and keep the discussion in a single issue (#6038)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants