Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

resilver_restart_001 intermittently fails #9677

Closed
jwk404 opened this issue Dec 4, 2019 · 3 comments · Fixed by #9703
Closed

resilver_restart_001 intermittently fails #9677

jwk404 opened this issue Dec 4, 2019 · 3 comments · Fixed by #9703
Labels
Component: Test Suite Indicates an issue with the test framework or a test case

Comments

@jwk404
Copy link
Contributor

jwk404 commented Dec 4, 2019

System information

Type Version/Name
Distribution Name Ubuntu
Distribution Version 18.04
Linux Kernel 4.15
Architecture x86
ZFS Version 5a08977
SPL Version

Describe the problem you're observing

resilver_restart_001 fails sometimes, possibly a race in verify_restarts()

Describe how to reproduce the problem

The test fails intermittently.

Include any warning/errors/backtraces from the system logs

04:11:20.87 NOTE: expected 1 resilver start(s) after offline/online, found 1
04:11:22.08 Added handler 59 with the following properties:
04:11:22.08   pool: testpool
04:11:22.08   vdev: 8f34acbcad36f507
04:11:22.08 SUCCESS: zinject -a -d /var/tmp/file-3 -e io -T read -f 0.25 testpool
04:11:23.46 removed all registered handlers
04:11:23.46 SUCCESS: zinject -c all
04:11:23.48 NOTE: expected 1 resilver start(s) after zinject, found 1
04:11:27.93 SUCCESS: zpool sync testpool
04:11:27.94 NOTE: expected 2 resilver start(s) after resilver, found 1
04:11:27.94 expected 2 resilver start(s) after resilver, found 1
04:11:27.95 NOTE: Performing test-fail callback (/usr/share/zfs/zfs-tests/callbacks/zfs_dbgmsg.ksh)
@behlendorf behlendorf added the Component: Test Suite Indicates an issue with the test framework or a test case label Dec 4, 2019
@behlendorf
Copy link
Contributor

I've recently noticed this as well. @jwpoduska would you mind looking in to this, #9588 added this test case. Here are two example failures hit by the CI. It's failing in about 2% of CI test runs.

http://build.zfsonlinux.org/builders/Amazon%202%20x86_64%20Release%20%28TEST%29/builds/7645
http://build.zfsonlinux.org/builders/CentOS%207%20x86_64%20%28TEST%29/builds/9098

@jwpoduska
Copy link
Contributor

I'm looking at it, but nothing to report yet

jwpoduska added a commit to datto/zfs that referenced this issue Dec 6, 2019
The resilver restart test was reported as failing about 2% of the
time. Two issues were found:
- The event log wasn't large enough, so resilver events were missing
- One 'zpool sync' wasn't enough for resilver to start after zinject

Signed-off-by: John Poduska <[email protected]>
Closes openzfs#9677
@PrivatePuffin
Copy link
Contributor

@jwpoduska I opened a PR for your fix, just so it's clear to everyone there is something that might do something about this :)

jwpoduska added a commit to datto/zfs that referenced this issue Dec 9, 2019
The resilver restart test was reported as failing about 2% of the
time. Two issues were found:
- The event log wasn't large enough, so resilver events were missing
- One 'zpool sync' wasn't enough for resilver to start after zinject

Signed-off-by: John Poduska <[email protected]>
Closes openzfs#9677
behlendorf pushed a commit that referenced this issue Dec 10, 2019
The resilver restart test was reported as failing about 2% of the
time. Two issues were found:

- The event log wasn't large enough, so resilver events were missing
- One 'zpool sync' wasn't enough for resilver to start after zinject

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: John Kennedy <[email protected]>
Reviewed-by: Kjeld Schouten <[email protected]>
Signed-off-by: John Poduska <[email protected]>
Issue #9588 
Closes #9677 
Closes #9703
tonyhutter pushed a commit to tonyhutter/zfs that referenced this issue Jan 22, 2020
The resilver restart test was reported as failing about 2% of the
time. Two issues were found:

- The event log wasn't large enough, so resilver events were missing
- One 'zpool sync' wasn't enough for resilver to start after zinject

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: John Kennedy <[email protected]>
Reviewed-by: Kjeld Schouten <[email protected]>
Signed-off-by: John Poduska <[email protected]>
Issue openzfs#9588
Closes openzfs#9677
Closes openzfs#9703
tonyhutter pushed a commit that referenced this issue Jan 23, 2020
The resilver restart test was reported as failing about 2% of the
time. Two issues were found:

- The event log wasn't large enough, so resilver events were missing
- One 'zpool sync' wasn't enough for resilver to start after zinject

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: John Kennedy <[email protected]>
Reviewed-by: Kjeld Schouten <[email protected]>
Signed-off-by: John Poduska <[email protected]>
Issue #9588
Closes #9677
Closes #9703
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Test Suite Indicates an issue with the test framework or a test case
Projects
None yet
4 participants