Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Processes that access zvols hang in D state #3667

Closed
Pentium100MHz opened this issue Aug 7, 2015 · 7 comments
Closed

Processes that access zvols hang in D state #3667

Pentium100MHz opened this issue Aug 7, 2015 · 7 comments

Comments

@Pentium100MHz
Copy link

Hi, I have a virtualization server running Debian 8, the server uses KVM for virtualization and zvols for storage.

Everything was OK for two months and then for some reason all processes that access zvols got stuck in D state.

Here's the situation now:

zpool status - works
zfs create tank/fish - works
zfs destroy tank/fish - works
zfs create -V 1G tank/fish - D state
stack trace for this:

[<ffffffffa0656554>] tsd_set+0x64/0x320 [spl]
[<ffffffffa08eee75>] zvol_create_minor+0x15/0x60 [zfs]
[<ffffffffa08eeeca>] zvol_create_minors_cb+0xa/0x10 [zfs]
[<ffffffffa0855c69>] dmu_objset_find_impl+0xe9/0x3d0 [zfs]
[<ffffffffa08eeec0>] zvol_create_minors_cb+0x0/0x10 [zfs]
[<ffffffffa08eeec0>] zvol_create_minors_cb+0x0/0x10 [zfs]
[<ffffffffa0855f93>] dmu_objset_find+0x43/0x70 [zfs]
[<ffffffffa08c538d>] zfs_ioc_create+0x15d/0x280 [zfs]
[<ffffffffa08c306f>] zfsdev_ioctl+0x1df/0x4c0 [zfs]
[<ffffffff811ba2ff>] do_vfs_ioctl+0x2cf/0x4b0
[<ffffffff811ba561>] SyS_ioctl+0x81/0xa0
[<ffffffff81512e68>] page_fault+0x28/0x30
[<ffffffff81510e4d>] system_call_fast_compare_end+0x10/0x15
[<ffffffffffffffff>] 0xffffffffffffffff

dd if=/dev/zvol/tank/otherfish of=/dev/null - D state
stack trace for this:

[<ffffffffa08ed87e>] zvol_open+0x4e/0x2f0 [zfs]
[<ffffffff811dc67c>] __blkdev_get+0xcc/0x480
[<ffffffff811dcd80>] blkdev_open+0x0/0x80
[<ffffffff811dcbe6>] blkdev_get+0x1b6/0x310
[<ffffffff811dcd80>] blkdev_open+0x0/0x80
[<ffffffff811a5a72>] do_dentry_open+0x1f2/0x330
[<ffffffff811a5d7d>] finish_open+0x2d/0x40
[<ffffffff811b68d2>] do_last+0xa72/0x11e0
[<ffffffff811b2ef6>] link_path_walk+0x286/0x890
[<ffffffff811b73d4>] path_openat+0x394/0x680
[<ffffffff811b7e6a>] do_filp_open+0x3a/0x90
[<ffffffff811c3ecc>] __alloc_fd+0x7c/0x120
[<ffffffff811a72b9>] do_sys_open+0x129/0x220
[<ffffffff81510e4d>] system_call_fast_compare_end+0x10/0x15
[<ffffffffffffffff>] 0xffffffffffffffff

stack trace for a virtual machine stuck in D state:

[<ffffffffa08ee515>] zvol_release+0x35/0xa0 [zfs]
[<ffffffff811dc56d>] __blkdev_put+0x15d/0x1a0
[<ffffffff811dcfc1>] blkdev_close+0x21/0x30
[<ffffffff811a99ba>] __fput+0xca/0x1d0
[<ffffffff81085107>] task_work_run+0x97/0xd0
[<ffffffff81012ea9>] do_notify_resume+0x69/0xa0
[<ffffffff8151110a>] int_signal+0x12/0x17
[<ffffffffffffffff>] 0xffffffffffffffff

And for another:

[<ffffffff811dce1a>] blkdev_put+0x1a/0x110
[<ffffffff811dcfc1>] blkdev_close+0x21/0x30
[<ffffffff811a99ba>] __fput+0xca/0x1d0
[<ffffffff81085107>] task_work_run+0x97/0xd0
[<ffffffff81012ea9>] do_notify_resume+0x69/0xa0
[<ffffffff8151110a>] int_signal+0x12/0x17
[<ffffffffffffffff>] 0xffffffffffffffff

Versions:
OS: Debian 8.0
zfs: 0.6.4-1.1-1
spl: 0.6.4-1b

What is happening and is this fixable?

@akorn
Copy link
Contributor

akorn commented Aug 11, 2015

How full is your pool?

@Pentium100MHz
Copy link
Author

zfs list home

NAME                   USED  AVAIL  REFER  MOUNTPOINT
home                  1.20T   230G   613K  /home/

So, not completely full, but not much space left.

The pool contains a single raidz1 vdev with 3 SSDs

@Pentium100MHz
Copy link
Author

I have found something that may or may not cause this:

It seems that there is a race condition during zvol creation/access in that if "zfs get all" is run at a specific time after "zfs create", it causes the freeze.

How I found this out:

Created two scripts - create-destroy.php and get.php and ran them both at the same time. In a few hours, the server was frozen - zvols could not be created or destroyed. After a reboot everything was OK for a few hours. When I introduced locking to make sure that only one "zfs" command was running at any one time, the problem went away.

create-destroy.php

<?php
for ($i=1; $i<=500000000000000; $i++) {
$rand=rand(1,10);
$rand2=rand(0,1);

if ($rand2 == 0) {
        shell_exec("zfs create -V 1G -s -o compression=lzjb  home/abc/test.".$rand);
}
if ($rand2 == 0) {
        shell_exec("zfs destroy home/abc/test.".$rand);
}
}
?>

get.php


<?php
for ($i=1; $i<=50000000000000000000000; $i++) {
$rand=rand(1,10);
$rand2=rand(0,1);
if ($rand2 == 0) {
        echo shell_exec("zfs get all home/abc/test.".$rand);
}
if ($rand2 == 1) {
        echo shell_exec("zfs list");
}
}
?>

@Pentium100MHz
Copy link
Author

It may also have something to do with creating or destroying snapshots of zvols.

It also seems that this issue is more common on 0.6.5.x versions as now it looks like servers with 0.6.4.x stopped freezing, but the ones with 0.6.5.x freeze about once a week.

@remyd1
Copy link

remyd1 commented Jan 21, 2019

Hi,

It seems that I have a similar problem here with ZFS snapshots using znapzend, tgtd and zvol, and with ZoL version 0.7.0.

The server works fine during a month or two, and then, I have some D state processes (zvol and tgtd first, then other processes).

I have 3 RAIDZ3 volumes using datasets and one another RAIDZ3 volume using zvol (and tgtd for iscsi). Maybe mixing those is not a good idea.

Scanning the disks using smartmontools returned me no error.

Maybe, I need to upgrade ZoL ... ? Server is currently running, so I cannot run those php scripts.

Best regards,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants
@behlendorf @remyd1 @akorn @Pentium100MHz and others