Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zvol_ENOSPC test suite failure #5848

Closed
ofaaland opened this issue Feb 28, 2017 · 5 comments
Closed

zvol_ENOSPC test suite failure #5848

ofaaland opened this issue Feb 28, 2017 · 5 comments
Labels
Component: Test Suite Indicates an issue with the test framework or a test case Status: Stale No recent activity for issue

Comments

@ofaaland
Copy link
Contributor

ofaaland commented Feb 28, 2017

System information

Type Version/Name
Distribution Name TOSS/RHEL
Distribution Version 6.7
Linux Kernel 2.6.32-573.18.1.1chaos.ch5.4.x86_64
Architecture x86_64
ZFS Version 66eead5
SPL Version openzfs/spl@9704820

Describe the problem you're observing

The zvol_ENOSPC test suite is repeatedly failing in the automated tests performed on PRs by the buildbot. Specifically, zvol_ENOSPC_001_pos times out and is killed, and the cleanup step fails. Subsequent tests often fail as well. The failure does not appear to depend on the nature of the patch.

Test: /usr/share/zfs/zfs-tests/tests/functional/zvol/zvol_ENOSPC/setup (run as root) [00:02] [PASS]
Test: /usr/share/zfs/zfs-tests/tests/functional/zvol/zvol_ENOSPC/zvol_ENOSPC_001_pos (run as root) [10:00] [KILLED]
Test: /usr/share/zfs/zfs-tests/tests/functional/zvol/zvol_ENOSPC/cleanup (run as root) [00:00] [FAIL]

I've given my VM's specific distro/kernel/etc above but do not know if this issue is specific to this environment.

Describe how to reproduce the problem

I believe I've reproduced the problem in a VM on my desktop:

  • truncate -s 64M /tmp/file.1
  • sudo cmd/zpool/zpool create frogger /tmp/file.1
  • sudo cmd/zfs/zfs create -V 8M frogger/myvol
  • sudo strace -T dd if=/dev/urandom of=/dev/zd0

The dd does eventually get ENOSPC in response to a write() call. However, the close() call for output file (the zvol's device) takes a very long time to return - 100s in the example I captured.

read(0, "\203\242\247\30\216Y\355Tp\323\371\17\300\30&q'\210\16\314\301\360i0\242\356*O\305\rT5"..., 512) = 512 <0.000140>                                                          
write(1, "\203\242\247\30\216Y\355Tp\323\371\17\300\30&q'\210\16\314\301\360i0\242\356*O\305\rT5"..., 512) = -1 ENOSPC (No space left on device) <0.000008>                         

and then

close(0)                                = 0 <0.000027>                                    
close(1)                                = 0 <100.729140> 

Include any warning/errors/backtraces from the system logs

I saw two stacks while waiting on the close() call to return. I don't know if either is a useful clue:

[faaland1@locks branch:b_refcount_t zfs] $sudo cat /proc/23312/stack
[<ffffffff81127acd>] sync_page+0x3d/0x50                            
[<ffffffff81127a67>] __lock_page+0x67/0x70                          
[<ffffffff8113cebd>] write_cache_pages+0x38d/0x4c0                  
[<ffffffff8113d014>] generic_writepages+0x24/0x30
[<ffffffff8113d041>] do_writepages+0x21/0x40
[<ffffffff8112828b>] __filemap_fdatawrite_range+0x5b/0x60
[<ffffffff811287ef>] filemap_fdatawrite+0x1f/0x30
[<ffffffff81128835>] filemap_write_and_wait+0x35/0x60
[<ffffffff811d0f94>] __sync_blockdev+0x24/0x50
[<ffffffff811d0fd3>] sync_blockdev+0x13/0x20
[<ffffffff811d1168>] __blkdev_put+0x188/0x1c0
[<ffffffff811d11b0>] blkdev_put+0x10/0x20
[<ffffffff811d11f3>] blkdev_close+0x33/0x60
[<ffffffff81194858>] __fput+0x108/0x230
[<ffffffff811949a5>] fput+0x25/0x30
[<ffffffff8118fa6d>] filp_close+0x5d/0x90
[<ffffffff8118fb45>] sys_close+0xa5/0x100
[<ffffffff8100b328>] tracesys+0xd9/0xde
[<ffffffffffffffff>] 0xffffffffffffffff

and

[faaland1@locks branch:b_refcount_t zfs] $sudo cat /proc/23312/stack
[<ffffffffa028523d>] cv_wait_common+0x15d/0x2b0 [spl]
[<ffffffffa02853e5>] __cv_wait+0x15/0x20 [spl]
[<ffffffffa045a063>] txg_wait_open+0xf3/0x1d0 [zfs]
[<ffffffffa03f9b3c>] dmu_tx_wait+0x4ac/0x510 [zfs]
[<ffffffffa03f9c8c>] dmu_tx_assign+0xec/0x7a0 [zfs]
[<ffffffffa04d972c>] zvol_request+0x4ec/0xa20 [zfs]
[<ffffffff81277062>] generic_make_request+0x2b2/0x660
[<ffffffff81277480>] submit_bio+0x70/0x120
[<ffffffff811c927d>] submit_bh+0x11d/0x1f0
[<ffffffff811cbb38>] __block_write_full_page+0x1d8/0x340
[<ffffffff811cbd80>] block_write_full_page_endio+0xe0/0x120
[<ffffffff811cbdd5>] block_write_full_page+0x15/0x20
[<ffffffff811d0c08>] blkdev_writepage+0x18/0x20
[<ffffffff8113ba27>] __writepage+0x17/0x40
[<ffffffff8113cd48>] write_cache_pages+0x218/0x4c0
[<ffffffff8113d014>] generic_writepages+0x24/0x30
[<ffffffff8113d041>] do_writepages+0x21/0x40
[<ffffffff8112828b>] __filemap_fdatawrite_range+0x5b/0x60
[<ffffffff811287ef>] filemap_fdatawrite+0x1f/0x30
[<ffffffff81128835>] filemap_write_and_wait+0x35/0x60
[<ffffffff811d0f94>] __sync_blockdev+0x24/0x50
[<ffffffff811d0fd3>] sync_blockdev+0x13/0x20
[<ffffffff811d1168>] __blkdev_put+0x188/0x1c0
[<ffffffff811d11b0>] blkdev_put+0x10/0x20
[<ffffffff811d11f3>] blkdev_close+0x33/0x60
[<ffffffff81194858>] __fput+0x108/0x230
[<ffffffff811949a5>] fput+0x25/0x30
[<ffffffff8118fa6d>] filp_close+0x5d/0x90
[<ffffffff8118fb45>] sys_close+0xa5/0x100
[<ffffffff8100b328>] tracesys+0xd9/0xde
[<ffffffffffffffff>] 0xffffffffffffffff
@ofaaland
Copy link
Contributor Author

ofaaland commented Feb 28, 2017

Note also that the close() time varies. I performed the dd ten times with strace -T and recorded the time for the close() call:

[faaland1@locks branch:spl-0.7.0-rc3) spl] $fgrep 'close(1)' /tmp/tentimes
close(1)                                = 0 <2.076215>
close(1)                                = 0 <49.508769>
close(1)                                = 0 <18.009149>
close(1)                                = 0 <2.174923>
close(1)                                = 0 <104.548481>
close(1)                                = 0 <82.444687>
close(1)                                = 0 <100.997135>
close(1)                                = 0 <15.332051>
close(1)                                = 0 <108.203375>
close(1)                                = 0 <76.234823>

@ofaaland
Copy link
Contributor Author

I captured the stack 100 times in a tight loop during one of the slow close(1) calls, like a handmade perf. The longer stack, with zvol_request, appeared in all 100.

@ofaaland
Copy link
Contributor Author

ofaaland commented Feb 28, 2017

In my test, spl and zfs were built with debug on, in case that matters. And I was running in-tree.

@ofaaland
Copy link
Contributor Author

ofaaland commented Feb 28, 2017

Below comment posted earlier is wrong. I'd changed the amount of data written with dd accidentally:

I cannot recreate this behavior with zfs/10a3f88 (Tag 0.7.0-rc3).  With that build, the first
dd to the zvol results in close() taking 2 seconds, but after that they take about 0.6 seconds each.

@behlendorf behlendorf added the Component: Test Suite Indicates an issue with the test framework or a test case label Nov 13, 2017
behlendorf added a commit to behlendorf/zfs that referenced this issue Nov 13, 2017
Occasionally observed failure of zvol_ENOSPC_001_pos due to the
test case taking to long to complete.  Disable the test case until
it can be improved.

Signed-off-by: Brian Behlendorf <[email protected]>
Issue openzfs#5848
behlendorf added a commit that referenced this issue Nov 14, 2017
Occasionally observed failure of zvol_ENOSPC_001_pos due to the
test case taking too long to complete.  Disable the test case until
it can be improved.

Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Giuseppe Di Natale <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Issue #5848 
Closes #6862
Nasf-Fan pushed a commit to Nasf-Fan/zfs that referenced this issue Jan 29, 2018
Occasionally observed failure of zvol_ENOSPC_001_pos due to the
test case taking too long to complete.  Disable the test case until
it can be improved.

Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Giuseppe Di Natale <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Issue openzfs#5848
Closes openzfs#6862
Nasf-Fan pushed a commit to Nasf-Fan/zfs that referenced this issue Feb 13, 2018
Occasionally observed failure of zvol_ENOSPC_001_pos due to the
test case taking too long to complete.  Disable the test case until
it can be improved.

Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Giuseppe Di Natale <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Issue openzfs#5848
Closes openzfs#6862
FransUrbo pushed a commit to FransUrbo/zfs that referenced this issue Apr 28, 2019
Occasionally observed failure of zvol_ENOSPC_001_pos due to the
test case taking too long to complete.  Disable the test case until
it can be improved.

Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Giuseppe Di Natale <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Issue openzfs#5848 
Closes openzfs#6862
@stale
Copy link

stale bot commented Aug 25, 2020

This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the Status: Stale No recent activity for issue label Aug 25, 2020
@stale stale bot closed this as completed Nov 24, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Test Suite Indicates an issue with the test framework or a test case Status: Stale No recent activity for issue
Projects
None yet
Development

No branches or pull requests

2 participants