Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CPU lock During the ZFS+Zvol #2272

Closed
albertyann opened this issue Apr 22, 2014 · 16 comments
Closed

CPU lock During the ZFS+Zvol #2272

albertyann opened this issue Apr 22, 2014 · 16 comments

Comments

@albertyann
Copy link

I am running the CentOS 6.5, ZFS-0.6.2

Here is the output for running ext4 on zvol. Please using 500G zvol and block size =4096 and try to mkfs.ext4, you will see the result. This may related to the slow mkfs.ext4.

NFO: task zvol/3:3945 blocked for more than 120 seconds.
      Tainted: P           ---------------    2.6.32-431.el6.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
zvol/3        D 0000000000000003     0  3945      2 0x00000080
 ffff8804301e5bb0 0000000000000046 0000000000000002 0000000400000000
 0000000000000000 0000000000000002 ffff8804301e5b70 ffffffffa044b619
 ffff8804376dc5f8 ffff8804301e5fd8 000000000000fbc8 ffff8804376dc5f8
Call Trace:
 [<ffffffffa044b619>] ? dbuf_hold_impl+0x99/0xc0 [zfs]
 [<ffffffff8109b4ee>] ? prepare_to_wait_exclusive+0x4e/0x80
 [<ffffffffa0378725>] cv_wait_common+0x105/0x1c0 [spl]
 [<ffffffff8109b2a0>] ? autoremove_wake_function+0x0/0x40
 [<ffffffffa0378835>] __cv_wait+0x15/0x20 [spl]
 [<ffffffffa045f89d>] dmu_tx_wait+0x8d/0x2b0 [zfs]
 [<ffffffffa045fba1>] dmu_tx_assign+0x91/0x490 [zfs]
 [<ffffffffa04528dd>] dmu_free_long_range+0x18d/0x250 [zfs]
 [<ffffffffa04ef9aa>] zvol_discard+0xaa/0x110 [zfs]
 [<ffffffffa0371568>] taskq_thread+0x218/0x4b0 [spl]
 [<ffffffff81527910>] ? thread_return+0x4e/0x76e
 [<ffffffff81065df0>] ? default_wake_function+0x0/0x20
 [<ffffffffa0371350>] ? taskq_thread+0x0/0x4b0 [spl]
 [<ffffffff8109aef6>] kthread+0x96/0xa0
 [<ffffffff8100c20a>] child_rip+0xa/0x20
 [<ffffffff8109ae60>] ? kthread+0x0/0xa0
 [<ffffffff8100c200>] ? child_rip+0x0/0x20
INFO: task zvol/4:3946 blocked for more than 120 seconds.
      Tainted: P           ---------------    2.6.32-431.el6.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
zvol/4        D 0000000000000009     0  3946      2 0x00000080
 ffff880433063bb0 0000000000000046 0000000000000002 0000000400000000
 0000000000000000 0000000000000002 ffff880433063b70 ffffffffa044b619
 ffff880436651098 ffff880433063fd8 000000000000fbc8 ffff880436651098
Call Trace:
 [<ffffffffa044b619>] ? dbuf_hold_impl+0x99/0xc0 [zfs]
 [<ffffffff8109b4ee>] ? prepare_to_wait_exclusive+0x4e/0x80
 [<ffffffffa0378725>] cv_wait_common+0x105/0x1c0 [spl]
 [<ffffffff8109b2a0>] ? autoremove_wake_function+0x0/0x40
 [<ffffffffa0378835>] __cv_wait+0x15/0x20 [spl]
 [<ffffffffa045f89d>] dmu_tx_wait+0x8d/0x2b0 [zfs]
 [<ffffffffa045fba1>] dmu_tx_assign+0x91/0x490 [zfs]
 [<ffffffffa04528dd>] dmu_free_long_range+0x18d/0x250 [zfs]
 [<ffffffffa04ef9aa>] zvol_discard+0xaa/0x110 [zfs]
 [<ffffffffa0371568>] taskq_thread+0x218/0x4b0 [spl]
 [<ffffffff81527910>] ? thread_return+0x4e/0x76e
 [<ffffffff81065df0>] ? default_wake_function+0x0/0x20
 [<ffffffffa0371350>] ? taskq_thread+0x0/0x4b0 [spl]
 [<ffffffff8109aef6>] kthread+0x96/0xa0
 [<ffffffff8100c20a>] child_rip+0xa/0x20
 [<ffffffff8109ae60>] ? kthread+0x0/0xa0
 [<ffffffff8100c200>] ? child_rip+0x0/0x20
INFO: task zvol/7:3949 blocked for more than 120 seconds.
      Tainted: P           ---------------    2.6.32-431.el6.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
zvol/7        D 0000000000000000     0  3949      2 0x00000080
 ffff88043660fbb0 0000000000000046 0000000000000002 0000000400000000
 0000000000000000 0000000000000002 ffff88043660fb70 ffffffffa044b619
 ffff880436ed85f8 ffff88043660ffd8 000000000000fbc8 ffff880436ed85f8
Call Trace:
 [<ffffffffa044b619>] ? dbuf_hold_impl+0x99/0xc0 [zfs]
 [<ffffffff8109b4ee>] ? prepare_to_wait_exclusive+0x4e/0x80
 [<ffffffffa0378725>] cv_wait_common+0x105/0x1c0 [spl]
 [<ffffffff8109b2a0>] ? autoremove_wake_function+0x0/0x40
 [<ffffffffa0378835>] __cv_wait+0x15/0x20 [spl]
 [<ffffffffa045f89d>] dmu_tx_wait+0x8d/0x2b0 [zfs]
 [<ffffffffa045fba1>] dmu_tx_assign+0x91/0x490 [zfs]
 [<ffffffffa04528dd>] dmu_free_long_range+0x18d/0x250 [zfs]
 [<ffffffffa04ef9aa>] zvol_discard+0xaa/0x110 [zfs]
 [<ffffffffa0371568>] taskq_thread+0x218/0x4b0 [spl]
 [<ffffffff81527910>] ? thread_return+0x4e/0x76e
 [<ffffffff81065df0>] ? default_wake_function+0x0/0x20
 [<ffffffffa0371350>] ? taskq_thread+0x0/0x4b0 [spl]
 [<ffffffff8109aef6>] kthread+0x96/0xa0
 [<ffffffff8100c20a>] child_rip+0xa/0x20
 [<ffffffff8109ae60>] ? kthread+0x0/0xa0
 [<ffffffff8100c200>] ? child_rip+0x0/0x20
INFO: task zvol/8:3950 blocked for more than 120 seconds.
      Tainted: P           ---------------    2.6.32-431.el6.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
zvol/8        D 0000000000000002     0  3950      2 0x00000080
 ffff880432fd3bb0 0000000000000046 0000000000000002 0000000400000000
 0000000000000000 0000000000000002 ffff880432fd3b70 ffffffffa044b619
 ffff880434462638 ffff880432fd3fd8 000000000000fbc8 ffff880434462638
Call Trace:
 [<ffffffffa044b619>] ? dbuf_hold_impl+0x99/0xc0 [zfs]
 [<ffffffff8109b4ee>] ? prepare_to_wait_exclusive+0x4e/0x80
 [<ffffffffa0378725>] cv_wait_common+0x105/0x1c0 [spl]
 [<ffffffff8109b2a0>] ? autoremove_wake_function+0x0/0x40
 [<ffffffffa0378835>] __cv_wait+0x15/0x20 [spl]
 [<ffffffffa045f89d>] dmu_tx_wait+0x8d/0x2b0 [zfs]
 [<ffffffffa045fba1>] dmu_tx_assign+0x91/0x490 [zfs]
 [<ffffffffa04528dd>] dmu_free_long_range+0x18d/0x250 [zfs]
 [<ffffffffa04ef9aa>] zvol_discard+0xaa/0x110 [zfs]
 [<ffffffffa0371568>] taskq_thread+0x218/0x4b0 [spl]
 [<ffffffff81527910>] ? thread_return+0x4e/0x76e
 [<ffffffff81065df0>] ? default_wake_function+0x0/0x20
 [<ffffffffa0371350>] ? taskq_thread+0x0/0x4b0 [spl]
 [<ffffffff8109aef6>] kthread+0x96/0xa0
 [<ffffffff8100c20a>] child_rip+0xa/0x20
 [<ffffffff8109ae60>] ? kthread+0x0/0xa0
 [<ffffffff8100c200>] ? child_rip+0x0/0x20
INFO: task zvol/9:3951 blocked for more than 120 seconds.
      Tainted: P           ---------------    2.6.32-431.el6.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
zvol/9        D 0000000000000007     0  3951      2 0x00000080
 ffff880432f79bb0 0000000000000046 0000000000000002 0000000400000000
 0000000000000000 0000000000000002 ffff880432f79b70 ffffffffa044b619
 ffff880436c36638 ffff880432f79fd8 000000000000fbc8 ffff880436c36638
Call Trace:
 [<ffffffffa044b619>] ? dbuf_hold_impl+0x99/0xc0 [zfs]
 [<ffffffff8109b4ee>] ? prepare_to_wait_exclusive+0x4e/0x80
 [<ffffffffa0378725>] cv_wait_common+0x105/0x1c0 [spl]
 [<ffffffff8109b2a0>] ? autoremove_wake_function+0x0/0x40
 [<ffffffffa0378835>] __cv_wait+0x15/0x20 [spl]
 [<ffffffffa045f89d>] dmu_tx_wait+0x8d/0x2b0 [zfs]
 [<ffffffffa045fba1>] dmu_tx_assign+0x91/0x490 [zfs]
 [<ffffffffa04528dd>] dmu_free_long_range+0x18d/0x250 [zfs]
 [<ffffffffa04ef9aa>] zvol_discard+0xaa/0x110 [zfs]
 [<ffffffffa0371568>] taskq_thread+0x218/0x4b0 [spl]
 [<ffffffff81527910>] ? thread_return+0x4e/0x76e
 [<ffffffff81065df0>] ? default_wake_function+0x0/0x20
 [<ffffffffa0371350>] ? taskq_thread+0x0/0x4b0 [spl]
 [<ffffffff8109aef6>] kthread+0x96/0xa0
 [<ffffffff8100c20a>] child_rip+0xa/0x20
 [<ffffffff8109ae60>] ? kthread+0x0/0xa0
 [<ffffffff8100c200>] ? child_rip+0x0/0x20
INFO: task zvol/10:3952 blocked for more than 120 seconds.
      Tainted: P           ---------------    2.6.32-431.el6.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
zvol/10       D 000000000000000b     0  3952      2 0x00000080
 ffff880437a0dbb0 0000000000000046 0000000000000002 0000000400000000
 0000000000000000 0000000000000002 ffff880437a0db70 ffffffffa044b619
 ffff880439a925f8 ffff880437a0dfd8 000000000000fbc8 ffff880439a925f8
Call Trace:
 [<ffffffffa044b619>] ? dbuf_hold_impl+0x99/0xc0 [zfs]
 [<ffffffff8109b4ee>] ? prepare_to_wait_exclusive+0x4e/0x80
 [<ffffffffa0378725>] cv_wait_common+0x105/0x1c0 [spl]
 [<ffffffff8109b2a0>] ? autoremove_wake_function+0x0/0x40
 [<ffffffffa0378835>] __cv_wait+0x15/0x20 [spl]
 [<ffffffffa045f89d>] dmu_tx_wait+0x8d/0x2b0 [zfs]
 [<ffffffffa045fba1>] dmu_tx_assign+0x91/0x490 [zfs]
 [<ffffffffa04528dd>] dmu_free_long_range+0x18d/0x250 [zfs]
 [<ffffffffa04ef9aa>] zvol_discard+0xaa/0x110 [zfs]
 [<ffffffffa0371568>] taskq_thread+0x218/0x4b0 [spl]
 [<ffffffff81527910>] ? thread_return+0x4e/0x76e
 [<ffffffff81065df0>] ? default_wake_function+0x0/0x20
 [<ffffffffa0371350>] ? taskq_thread+0x0/0x4b0 [spl]
 [<ffffffff8109aef6>] kthread+0x96/0xa0
 [<ffffffff8100c20a>] child_rip+0xa/0x20
 [<ffffffff8109ae60>] ? kthread+0x0/0xa0
 [<ffffffff8100c200>] ? child_rip+0x0/0x20
INFO: task zvol/11:3953 blocked for more than 120 seconds.
      Tainted: P           ---------------    2.6.32-431.el6.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
zvol/11       D 0000000000000007     0  3953      2 0x00000080
 ffff880432fc5bb0 0000000000000046 0000000000000002 0000000400000000
 0000000000000000 0000000000000002 ffff880432fc5b70 ffffffffa044b619
 ffff880439a93058 ffff880432fc5fd8 000000000000fbc8 ffff880439a93058
Call Trace:
 [<ffffffffa044b619>] ? dbuf_hold_impl+0x99/0xc0 [zfs]
 [<ffffffff8109b4ee>] ? prepare_to_wait_exclusive+0x4e/0x80
 [<ffffffffa0378725>] cv_wait_common+0x105/0x1c0 [spl]
 [<ffffffff8109b2a0>] ? autoremove_wake_function+0x0/0x40
 [<ffffffffa0378835>] __cv_wait+0x15/0x20 [spl]
 [<ffffffffa045f89d>] dmu_tx_wait+0x8d/0x2b0 [zfs]
 [<ffffffffa045fba1>] dmu_tx_assign+0x91/0x490 [zfs]
 [<ffffffffa04528dd>] dmu_free_long_range+0x18d/0x250 [zfs]
 [<ffffffffa04ef9aa>] zvol_discard+0xaa/0x110 [zfs]
 [<ffffffffa0371568>] taskq_thread+0x218/0x4b0 [spl]
 [<ffffffff81527910>] ? thread_return+0x4e/0x76e
 [<ffffffff81065df0>] ? default_wake_function+0x0/0x20
 [<ffffffffa0371350>] ? taskq_thread+0x0/0x4b0 [spl]
 [<ffffffff8109aef6>] kthread+0x96/0xa0
 [<ffffffff8100c20a>] child_rip+0xa/0x20
 [<ffffffff8109ae60>] ? kthread+0x0/0xa0
 [<ffffffff8100c200>] ? child_rip+0x0/0x20
INFO: task zvol/17:3959 blocked for more than 120 seconds.
      Tainted: P           ---------------    2.6.32-431.el6.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
zvol/17       D 0000000000000004     0  3959      2 0x00000080
 ffff880437c9fbb0 0000000000000046 0000000000000002 0000000400000000
 0000000000000000 0000000000000002 ffff880437c9fb70 ffffffffa044b619
 ffff880437a93ab8 ffff880437c9ffd8 000000000000fbc8 ffff880437a93ab8
Call Trace:
 [<ffffffffa044b619>] ? dbuf_hold_impl+0x99/0xc0 [zfs]
 [<ffffffff8109b4ee>] ? prepare_to_wait_exclusive+0x4e/0x80
 [<ffffffffa0378725>] cv_wait_common+0x105/0x1c0 [spl]
 [<ffffffff8109b2a0>] ? autoremove_wake_function+0x0/0x40
 [<ffffffffa0378835>] __cv_wait+0x15/0x20 [spl]
 [<ffffffffa045f89d>] dmu_tx_wait+0x8d/0x2b0 [zfs]
 [<ffffffffa045fba1>] dmu_tx_assign+0x91/0x490 [zfs]
 [<ffffffffa04528dd>] dmu_free_long_range+0x18d/0x250 [zfs]
 [<ffffffffa04ef9aa>] zvol_discard+0xaa/0x110 [zfs]
 [<ffffffffa0371568>] taskq_thread+0x218/0x4b0 [spl]
 [<ffffffff81527910>] ? thread_return+0x4e/0x76e
 [<ffffffff81065df0>] ? default_wake_function+0x0/0x20
 [<ffffffffa0371350>] ? taskq_thread+0x0/0x4b0 [spl]
 [<ffffffff8109aef6>] kthread+0x96/0xa0
 [<ffffffff8100c20a>] child_rip+0xa/0x20
 [<ffffffff8109ae60>] ? kthread+0x0/0xa0
 [<ffffffff8100c200>] ? child_rip+0x0/0x20
INFO: task zvol/19:3961 blocked for more than 120 seconds.
      Tainted: P           ---------------    2.6.32-431.el6.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
zvol/19       D 0000000000000001     0  3961      2 0x00000080
 ffff8804330ddbb0 0000000000000046 0000000000000002 0000000400000000
 0000000000000000 0000000000000002 ffff8804330ddb70 ffffffffa044b619
 ffff880437410638 ffff8804330ddfd8 000000000000fbc8 ffff880437410638
Call Trace:
 [<ffffffffa044b619>] ? dbuf_hold_impl+0x99/0xc0 [zfs]
 [<ffffffff8109b4ee>] ? prepare_to_wait_exclusive+0x4e/0x80
 [<ffffffffa0378725>] cv_wait_common+0x105/0x1c0 [spl]
 [<ffffffff8109b2a0>] ? autoremove_wake_function+0x0/0x40
 [<ffffffffa0378835>] __cv_wait+0x15/0x20 [spl]
 [<ffffffffa045f89d>] dmu_tx_wait+0x8d/0x2b0 [zfs]
 [<ffffffffa045fba1>] dmu_tx_assign+0x91/0x490 [zfs]
 [<ffffffffa04528dd>] dmu_free_long_range+0x18d/0x250 [zfs]
 [<ffffffffa04ef9aa>] zvol_discard+0xaa/0x110 [zfs]
 [<ffffffffa0371568>] taskq_thread+0x218/0x4b0 [spl]
 [<ffffffff81527910>] ? thread_return+0x4e/0x76e
 [<ffffffff81065df0>] ? default_wake_function+0x0/0x20
 [<ffffffffa0371350>] ? taskq_thread+0x0/0x4b0 [spl]
 [<ffffffff8109aef6>] kthread+0x96/0xa0
 [<ffffffff8100c20a>] child_rip+0xa/0x20
 [<ffffffff8109ae60>] ? kthread+0x0/0xa0
 [<ffffffff8100c200>] ? child_rip+0x0/0x20
INFO: task zvol/21:3963 blocked for more than 120 seconds.
      Tainted: P           ---------------    2.6.32-431.el6.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
zvol/21       D 0000000000000003     0  3963      2 0x00000080
 ffff8804330c7bb0 0000000000000046 0000000000000002 0000000400000000
 0000000000000000 0000000000000002 ffff8804330c7b70 ffffffffa044b619
 ffff880434f9faf8 ffff8804330c7fd8 000000000000fbc8 ffff880434f9faf8
Call Trace:
 [<ffffffffa044b619>] ? dbuf_hold_impl+0x99/0xc0 [zfs]
 [<ffffffff8109b4ee>] ? prepare_to_wait_exclusive+0x4e/0x80
 [<ffffffffa0378725>] cv_wait_common+0x105/0x1c0 [spl]
 [<ffffffff8109b2a0>] ? autoremove_wake_function+0x0/0x40
 [<ffffffffa0378835>] __cv_wait+0x15/0x20 [spl]
 [<ffffffffa045f89d>] dmu_tx_wait+0x8d/0x2b0 [zfs]
 [<ffffffffa045fba1>] dmu_tx_assign+0x91/0x490 [zfs]
 [<ffffffffa04528dd>] dmu_free_long_range+0x18d/0x250 [zfs]
 [<ffffffffa04ef9aa>] zvol_discard+0xaa/0x110 [zfs]
 [<ffffffffa0371568>] taskq_thread+0x218/0x4b0 [spl]
 [<ffffffff81527910>] ? thread_return+0x4e/0x76e
 [<ffffffff81065df0>] ? default_wake_function+0x0/0x20
 [<ffffffffa0371350>] ? taskq_thread+0x0/0x4b0 [spl]
 [<ffffffff8109aef6>] kthread+0x96/0xa0
 [<ffffffff8100c20a>] child_rip+0xa/0x20
 [<ffffffff8109ae60>] ? kthread+0x0/0xa0
 [<ffffffff8100c200>] ? child_rip+0x0/0x20

I find the #2208 maybe the same problem whit me.

AL

@behlendorf behlendorf added this to the 0.6.4 milestone Apr 22, 2014
@behlendorf behlendorf added the Bug label Apr 22, 2014
@iceisfun
Copy link

I tried to replicate this on master

root@iceisfunzfs:# zfs create -V 256G tank/Volume_Test
root@iceisfunzfs:
# ls /mnt
root@iceisfunzfs:~# fdisk /dev/zvol/tank/Volume_Test
Device contains neither a valid DOS partition table, nor Sun, SGI or OSF disklabel
Building a new DOS disklabel with disk identifier 0xb6566b3f.
Changes will remain in memory only, until you decide to write them.
After that, of course, the previous content won't be recoverable.

Warning: invalid flag 0x0000 of partition table 4 will be corrected by w(rite)

The device presents a logical sector size that is smaller than
the physical sector size. Aligning to a physical sector (or optimal
I/O) size boundary is recommended, or performance may be impacted.

Command (m for help): p

Disk /dev/zvol/tank/Volume_Test: 274.9 GB, 274877906944 bytes
16 heads, 63 sectors/track, 532610 cylinders, total 536870912 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 8192 bytes
I/O size (minimum/optimal): 8192 bytes / 8192 bytes
Disk identifier: 0xb6566b3f

                 Device Boot      Start         End      Blocks   Id  System

Command (m for help): n
Partition type:
p primary (0 primary, 0 extended, 4 free)
e extended
Select (default p): p
Partition number (1-4, default 1):
Using default value 1
First sector (2048-536870911, default 2048):
Using default value 2048
Last sector, +sectors or +size{K,M,G} (2048-536870911, default 536870911):
Using default value 536870911

Command (m for help): t
Selected partition 1
Hex code (type L to list codes): 83

Command (m for help): p

Disk /dev/zvol/tank/Volume_Test: 274.9 GB, 274877906944 bytes
16 heads, 63 sectors/track, 532610 cylinders, total 536870912 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 8192 bytes
I/O size (minimum/optimal): 8192 bytes / 8192 bytes
Disk identifier: 0xb6566b3f

                 Device Boot      Start         End      Blocks   Id  System

/dev/zvol/tank/Volume_Test1 2048 536870911 268434432 83 Linux

Command (m for help): w
The partition table has been altered!

Calling ioctl() to re-read partition table.
Syncing disks.
root@iceisfunzfs:~# mkfs.ext4 /dev/zvol/tank/Volume_Test
mke2fs 1.42.5 (29-Jul-2012)
Discarding device blocks: done
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
Stride=2 blocks, Stripe width=2 blocks
16777216 inodes, 67108864 blocks
3355443 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=4294967296
2048 block groups
32768 blocks per group, 32768 fragments per group
8192 inodes per group
Superblock backups stored on blocks:
32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
4096000, 7962624, 11239424, 20480000, 23887872

Allocating group tables: done
Writing inode tables: done
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done

root@iceisfunzfs:~#

@iceisfun
Copy link

Again with block size 4096

root@iceisfunzfs:# zfs create -V 256G -b 4096 tank/Volume_Test2
root@iceisfunzfs:
# fdisk /dev/zvol/tank/Volume_Test2
Device contains neither a valid DOS partition table, nor Sun, SGI or OSF disklabel
Building a new DOS disklabel with disk identifier 0x8bb0bfa6.
Changes will remain in memory only, until you decide to write them.
After that, of course, the previous content won't be recoverable.

Warning: invalid flag 0x0000 of partition table 4 will be corrected by w(rite)

The device presents a logical sector size that is smaller than
the physical sector size. Aligning to a physical sector (or optimal
I/O) size boundary is recommended, or performance may be impacted.

Command (m for help): n
Partition type:
p primary (0 primary, 0 extended, 4 free)
e extended
Select (default p): p
Partition number (1-4, default 1):
Using default value 1
First sector (2048-536870911, default 2048):
Using default value 2048
Last sector, +sectors or +size{K,M,G} (2048-536870911, default 536870911):
Using default value 536870911

Command (m for help): t
Selected partition 1
Hex code (type L to list codes): 83

Command (m for help): p

Disk /dev/zvol/tank/Volume_Test2: 274.9 GB, 274877906944 bytes
16 heads, 63 sectors/track, 532610 cylinders, total 536870912 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk identifier: 0x8bb0bfa6

                   Device Boot      Start         End      Blocks   Id  System

/dev/zvol/tank/Volume_Test2p1 2048 536870911 268434432 83 Linux

Command (m for help): w
The partition table has been altered!

Calling ioctl() to re-read partition table.
Syncing disks.
root@iceisfunzfs:# mkfs.ext4 /dev/zvol/tank/Volume_Test2
Volume_Test2 Volume_Test2-part1
root@iceisfunzfs:
# mkfs.ext4 /dev/zvol/tank/Volume_Test2-part1
mke2fs 1.42.5 (29-Jul-2012)
Discarding device blocks: done
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
Stride=0 blocks, Stripe width=0 blocks
16777216 inodes, 67108608 blocks
3355430 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=4294967296
2048 block groups
32768 blocks per group, 32768 fragments per group
8192 inodes per group
Superblock backups stored on blocks:
32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
4096000, 7962624, 11239424, 20480000, 23887872

Allocating group tables: done
Writing inode tables: done
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done

root@iceisfunzfs:~#

@albertyann
Copy link
Author

Thanks @iceisfun , create ext4.zvol is no problem. When I try some high I/O load test that problem is happen.

@iceisfun
Copy link

I am unable to create the condition you've described even under heavy load. Tomorrow I will try to recreate the issue with a different load test.

I did however manage to get my system load numbers up over 85
03:42:17 up 3 days, 9:52, 2 users, load average: 85.45, 85.34, 83.69
in this case spl_kmem_cache was taking ~100% cpu and kswapd was pretty busy too

Under this heavy load I got some different error messages.
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692041] zvol/0 D ffff88021fc93780 0 1296 2 0x00000000
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692045] ffff880213140080 0000000000000046 ffff880071a82400 ffff880212ae8770
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692049] 0000000000013780 ffff88020e8abfd8 ffff88020e8abfd8 ffff880213140080
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692053] 0000000000000202 ffffffff8134fd89 ffff88021327f158 ffff88021327f150
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692057] Call Trace:
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692065] [] ? _raw_spin_lock_irqsave+0x9/0x25
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692075] [] ? cv_wait_common+0xdf/0x175 [spl]
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692079] [] ? paravirt_read_tsc+0x5/0x8
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692082] [] ? read_tsc+0x5/0x14
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692085] [] ? add_wait_queue+0x3c/0x3c
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692089] [] ? should_resched+0x5/0x23
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692108] [] ? dmu_tx_wait+0x6a/0x23d [zfs]
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692111] [] ? should_resched+0x5/0x23
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692114] [] ? _cond_resched+0x7/0x1c
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692129] [] ? dmu_tx_assign+0xef/0x36e [zfs]
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692143] [] ? zvol_write+0xc3/0x3be [zfs]
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692148] [] ? taskq_thread+0x290/0x3e6 [spl]
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692152] [] ? try_to_wake_up+0x197/0x197
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692156] [] ? task_done+0x110/0x110 [spl]
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692159] [] ? kthread+0x76/0x7e
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692163] [] ? kernel_thread_helper+0x4/0x10
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692166] [] ? kthread_worker_fn+0x139/0x139
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692168] [] ? gs_change+0x13/0x13
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692176] zvol/1 D ffff88021fc93780 0 1297 2 0x00000000
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692179] ffff8802129aaa30 0000000000000046 ffff880071a82400 ffff880211b07590
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692183] 0000000000013780 ffff88021462bfd8 ffff88021462bfd8 ffff8802129aaa30
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692187] 0000000000000202 ffffffff8134fd89 ffff88021327f158 ffff88021327f150
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692190] Call Trace:
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692193] [] ? _raw_spin_lock_irqsave+0x9/0x25
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692197] [] ? cv_wait_common+0xdf/0x175 [spl]
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692200] [] ? paravirt_read_tsc+0x5/0x8
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692202] [] ? read_tsc+0x5/0x14
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692205] [] ? add_wait_queue+0x3c/0x3c
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692207] [] ? should_resched+0x5/0x23
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692222] [] ? dmu_tx_wait+0x6a/0x23d [zfs]
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692225] [] ? should_resched+0x5/0x23
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692227] [] ? _cond_resched+0x7/0x1c
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692242] [] ? dmu_tx_assign+0xef/0x36e [zfs]
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692255] [] ? zvol_write+0xc3/0x3be [zfs]
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692258] [] ? __wake_up+0x35/0x46
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692263] [] ? taskq_thread+0x290/0x3e6 [spl]
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692266] [] ? try_to_wake_up+0x197/0x197
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692270] [] ? task_done+0x110/0x110 [spl]
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692273] [] ? kthread+0x76/0x7e
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692276] [] ? kernel_thread_helper+0x4/0x10
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692278] [] ? kthread_worker_fn+0x139/0x139
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692281] [] ? gs_change+0x13/0x13
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692289] zvol/2 D ffff88021fc93780 0 1298 2 0x00000000
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692292] ffff880213140770 0000000000000046 ffff880071a82400 ffff880213e56fa0
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692295] 0000000000013780 ffff88020ef51fd8 ffff88020ef51fd8 ffff880213140770
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692299] 0000000000000202 ffffffff8134fd89 ffff88021327f158 ffff88021327f150
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692302] Call Trace:
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692305] [] ? _raw_spin_lock_irqsave+0x9/0x25
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692310] [] ? cv_wait_common+0xdf/0x175 [spl]
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692312] [] ? paravirt_read_tsc+0x5/0x8
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692315] [] ? read_tsc+0x5/0x14
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692317] [] ? add_wait_queue+0x3c/0x3c
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692320] [] ? should_resched+0x5/0x23
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692334] [] ? dmu_tx_wait+0x6a/0x23d [zfs]
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692337] [] ? should_resched+0x5/0x23
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692339] [] ? _cond_resched+0x7/0x1c
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692354] [] ? dmu_tx_assign+0xef/0x36e [zfs]
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692368] [] ? zvol_write+0xc3/0x3be [zfs]
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692370] [] ? __wake_up+0x35/0x46
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692375] [] ? taskq_thread+0x290/0x3e6 [spl]
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692378] [] ? try_to_wake_up+0x197/0x197
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692382] [] ? task_done+0x110/0x110 [spl]
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692385] [] ? kthread+0x76/0x7e
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692388] [] ? kernel_thread_helper+0x4/0x10
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692391] [] ? kthread_worker_fn+0x139/0x139
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692393] [] ? gs_change+0x13/0x13
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692401] zvol/3 D ffff88021fc13780 0 1299 2 0x00000000
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692404] ffff88020ff2f510 0000000000000046 0000000000000fbc ffffffff8160d020
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692408] 0000000000013780 ffff88020b1ebfd8 ffff88020b1ebfd8 ffff88020ff2f510
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692411] 0000000000000202 ffffffff8134fd89 ffff88021327f320 ffff88021327f318
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692415] Call Trace:
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692417] [] ? _raw_spin_lock_irqsave+0x9/0x25
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692422] [] ? cv_wait_common+0xdf/0x175 [spl]
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692425] [] ? add_wait_queue+0x3c/0x3c
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692427] [] ? should_resched+0x5/0x23
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692446] [] ? txg_wait_open+0xbe/0xf4 [zfs]
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692461] [] ? dmu_tx_wait+0x21c/0x23d [zfs]
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692463] [] ? mutex_lock+0xd/0x2d
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692478] [] ? dmu_tx_assign+0xef/0x36e [zfs]
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692492] [] ? zvol_write+0xc3/0x3be [zfs]
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692494] [] ? __wake_up+0x35/0x46
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692499] [] ? taskq_thread+0x290/0x3e6 [spl]
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692502] [] ? try_to_wake_up+0x197/0x197
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692506] [] ? task_done+0x110/0x110 [spl]
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692509] [] ? kthread+0x76/0x7e
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692512] [] ? kernel_thread_helper+0x4/0x10
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692515] [] ? kthread_worker_fn+0x139/0x139
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692517] [] ? gs_change+0x13/0x13
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692525] zvol/4 D ffff88021fd13780 0 1300 2 0x00000000
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692529] ffff8802138ee340 0000000000000046 ffff880071a82400 ffff880216d59610
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692532] 0000000000013780 ffff88020ee19fd8 ffff88020ee19fd8 ffff8802138ee340
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692536] 0000000000000202 ffffffff8134fd89 ffff88021327f158 ffff88021327f150
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692539] Call Trace:
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692542] [] ? _raw_spin_lock_irqsave+0x9/0x25
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692547] [] ? cv_wait_common+0xdf/0x175 [spl]
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692549] [] ? paravirt_read_tsc+0x5/0x8
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692552] [] ? read_tsc+0x5/0x14
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692554] [] ? add_wait_queue+0x3c/0x3c
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692557] [] ? should_resched+0x5/0x23
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692571] [] ? dmu_tx_wait+0x6a/0x23d [zfs]
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692574] [] ? should_resched+0x5/0x23
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692576] [] ? _cond_resched+0x7/0x1c
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692591] [] ? dmu_tx_assign+0xef/0x36e [zfs]
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692604] [] ? zvol_write+0xc3/0x3be [zfs]
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692607] [] ? __wake_up+0x35/0x46
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692612] [] ? taskq_thread+0x290/0x3e6 [spl]
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692615] [] ? try_to_wake_up+0x197/0x197
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692619] [] ? task_done+0x110/0x110 [spl]
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692622] [] ? kthread+0x76/0x7e
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692625] [] ? kernel_thread_helper+0x4/0x10
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692627] [] ? kthread_worker_fn+0x139/0x139
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692630] [] ? gs_change+0x13/0x13
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692638] zvol/5 D ffff88021fc93780 0 1301 2 0x00000000
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692641] ffff880213f75710 0000000000000046 00000000000000f6 ffff880216d160c0
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692645] 0000000000013780 ffff88020cdc5fd8 ffff88020cdc5fd8 ffff880213f75710
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692648] 0000000000000202 ffffffff8134fd89 ffff88021327f320 ffff88021327f318
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692652] Call Trace:
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692654] [] ? _raw_spin_lock_irqsave+0x9/0x25
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692659] [] ? cv_wait_common+0xdf/0x175 [spl]
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692662] [] ? add_wait_queue+0x3c/0x3c
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692664] [] ? should_resched+0x5/0x23
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692682] [] ? txg_wait_open+0xbe/0xf4 [zfs]
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692697] [] ? dmu_tx_wait+0x21c/0x23d [zfs]
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692700] [] ? mutex_lock+0xd/0x2d
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692714] [] ? dmu_tx_assign+0xef/0x36e [zfs]
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692728] [] ? zvol_write+0xc3/0x3be [zfs]
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692730] [] ? __wake_up+0x35/0x46
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692735] [] ? taskq_thread+0x290/0x3e6 [spl]
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692738] [] ? try_to_wake_up+0x197/0x197
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692742] [] ? task_done+0x110/0x110 [spl]
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692745] [] ? kthread+0x76/0x7e
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692748] [] ? kernel_thread_helper+0x4/0x10
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692750] [] ? kthread_worker_fn+0x139/0x139
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692753] [] ? gs_change+0x13/0x13
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692762] zvol/6 D ffff88021fd13780 0 1302 2 0x00000000
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692765] ffff88020e83f810 0000000000000046 ffff880071a82400 ffff880216d59610
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692769] 0000000000013780 ffff880215d7dfd8 ffff880215d7dfd8 ffff88020e83f810
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692772] 0000000000000202 ffffffff8134fd89 ffff88021327f158 ffff88021327f150
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692776] Call Trace:
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692778] [] ? _raw_spin_lock_irqsave+0x9/0x25
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692783] [] ? cv_wait_common+0xdf/0x175 [spl]
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692786] [] ? paravirt_read_tsc+0x5/0x8
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692788] [] ? read_tsc+0x5/0x14
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692791] [] ? add_wait_queue+0x3c/0x3c
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692793] [] ? should_resched+0x5/0x23
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692808] [] ? dmu_tx_wait+0x6a/0x23d [zfs]
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692810] [] ? should_resched+0x5/0x23
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692813] [] ? _cond_resched+0x7/0x1c
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692827] [] ? dmu_tx_assign+0xef/0x36e [zfs]
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692841] [] ? zvol_write+0xc3/0x3be [zfs]
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692844] [] ? __wake_up+0x35/0x46
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692848] [] ? taskq_thread+0x290/0x3e6 [spl]
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692851] [] ? try_to_wake_up+0x197/0x197
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692856] [] ? task_done+0x110/0x110 [spl]
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692858] [] ? kthread+0x76/0x7e
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692861] [] ? kernel_thread_helper+0x4/0x10
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692864] [] ? kthread_worker_fn+0x139/0x139
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692866] [] ? gs_change+0x13/0x13
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692875] zvol/7 D ffff88021fd13780 0 1303 2 0x00000000
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692879] ffff880213ab0180 0000000000000046 0000000000000000 ffff880216d59610
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692882] 0000000000013780 ffff88020e009fd8 ffff88020e009fd8 ffff880213ab0180
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692886] 0000000000000202 000000018134fd89 ffff88021327f320 ffff88021327f318
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692889] Call Trace:
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692894] [] ? cv_wait_common+0xdf/0x175 [spl]
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692897] [] ? add_wait_queue+0x3c/0x3c
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692899] [] ? should_resched+0x5/0x23
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692917] [] ? txg_wait_open+0xbe/0xf4 [zfs]
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692932] [] ? dmu_tx_wait+0x21c/0x23d [zfs]
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692935] [] ? mutex_lock+0xd/0x2d
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692949] [] ? dmu_tx_assign+0xef/0x36e [zfs]
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692963] [] ? zvol_write+0xc3/0x3be [zfs]
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692965] [] ? __wake_up+0x35/0x46
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692970] [] ? taskq_thread+0x290/0x3e6 [spl]
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692973] [] ? try_to_wake_up+0x197/0x197
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692977] [] ? task_done+0x110/0x110 [spl]
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692980] [] ? kthread+0x76/0x7e
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692983] [] ? kernel_thread_helper+0x4/0x10
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692986] [] ? kthread_worker_fn+0x139/0x139
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692988] [] ? gs_change+0x13/0x13
Apr 23 22:33:58 iceisfunzfs kernel: [276240.692997] zvol/8 D ffff88021fc93780 0 1304 2 0x00000000
Apr 23 22:33:58 iceisfunzfs kernel: [276240.693000] ffff880213e568b0 0000000000000046 ffff880071a82400 ffff880212ae8770
Apr 23 22:33:58 iceisfunzfs kernel: [276240.693004] 0000000000013780 ffff88020e2a5fd8 ffff88020e2a5fd8 ffff880213e568b0
Apr 23 22:33:58 iceisfunzfs kernel: [276240.693007] 0000000000000202 ffffffff8134fd89 ffff88021327f158 ffff88021327f150
Apr 23 22:33:58 iceisfunzfs kernel: [276240.693011] Call Trace:
Apr 23 22:33:58 iceisfunzfs kernel: [276240.693013] [] ? _raw_spin_lock_irqsave+0x9/0x25
Apr 23 22:33:58 iceisfunzfs kernel: [276240.693018] [] ? cv_wait_common+0xdf/0x175 [spl]
Apr 23 22:33:58 iceisfunzfs kernel: [276240.693021] [] ? paravirt_read_tsc+0x5/0x8
Apr 23 22:33:58 iceisfunzfs kernel: [276240.693023] [] ? read_tsc+0x5/0x14
Apr 23 22:33:58 iceisfunzfs kernel: [276240.693026] [] ? add_wait_queue+0x3c/0x3c
Apr 23 22:33:58 iceisfunzfs kernel: [276240.693028] [] ? should_resched+0x5/0x23
Apr 23 22:33:58 iceisfunzfs kernel: [276240.693043] [] ? dmu_tx_wait+0x6a/0x23d [zfs]
Apr 23 22:33:58 iceisfunzfs kernel: [276240.693045] [] ? should_resched+0x5/0x23
Apr 23 22:33:58 iceisfunzfs kernel: [276240.693047] [] ? _cond_resched+0x7/0x1c
Apr 23 22:33:58 iceisfunzfs kernel: [276240.693062] [] ? dmu_tx_assign+0xef/0x36e [zfs]
Apr 23 22:33:58 iceisfunzfs kernel: [276240.693076] [] ? zvol_write+0xc3/0x3be [zfs]
Apr 23 22:33:58 iceisfunzfs kernel: [276240.693078] [] ? __wake_up+0x35/0x46
Apr 23 22:33:58 iceisfunzfs kernel: [276240.693083] [] ? taskq_thread+0x290/0x3e6 [spl]
Apr 23 22:33:58 iceisfunzfs kernel: [276240.693086] [] ? try_to_wake_up+0x197/0x197
Apr 23 22:33:58 iceisfunzfs kernel: [276240.693090] [] ? task_done+0x110/0x110 [spl]
Apr 23 22:33:58 iceisfunzfs kernel: [276240.693093] [] ? kthread+0x76/0x7e
Apr 23 22:33:58 iceisfunzfs kernel: [276240.693096] [] ? kernel_thread_helper+0x4/0x10
Apr 23 22:33:58 iceisfunzfs kernel: [276240.693099] [] ? kthread_worker_fn+0x139/0x139
Apr 23 22:33:58 iceisfunzfs kernel: [276240.693101] [] ? gs_change+0x13/0x13
Apr 23 22:33:58 iceisfunzfs kernel: [276240.693110] zvol/9 D ffff88021fc13780 0 1305 2 0x00000000
Apr 23 22:33:58 iceisfunzfs kernel: [276240.693113] ffff8802139df850 0000000000000046 ffff880071a82400 ffff880213b911e0
Apr 23 22:33:58 iceisfunzfs kernel: [276240.693117] 0000000000013780 ffff88020e8d9fd8 ffff88020e8d9fd8 ffff8802139df850
Apr 23 22:33:58 iceisfunzfs kernel: [276240.693120] 0000000000000202 ffffffff8134fd89 ffff88021327f158 ffff88021327f150
Apr 23 22:33:58 iceisfunzfs kernel: [276240.693124] Call Trace:
Apr 23 22:33:58 iceisfunzfs kernel: [276240.693126] [] ? _raw_spin_lock_irqsave+0x9/0x25
Apr 23 22:33:58 iceisfunzfs kernel: [276240.693131] [] ? cv_wait_common+0xdf/0x175 [spl]
Apr 23 22:33:58 iceisfunzfs kernel: [276240.693134] [] ? paravirt_read_tsc+0x5/0x8
Apr 23 22:33:58 iceisfunzfs kernel: [276240.693136] [] ? read_tsc+0x5/0x14
Apr 23 22:33:58 iceisfunzfs kernel: [276240.693139] [] ? add_wait_queue+0x3c/0x3c
Apr 23 22:33:58 iceisfunzfs kernel: [276240.693141] [] ? should_resched+0x5/0x23
Apr 23 22:33:58 iceisfunzfs kernel: [276240.693156] [] ? dmu_tx_wait+0x6a/0x23d [zfs]
Apr 23 22:33:58 iceisfunzfs kernel: [276240.693158] [] ? should_resched+0x5/0x23
Apr 23 22:33:58 iceisfunzfs kernel: [276240.693161] [] ? _cond_resched+0x7/0x1c
Apr 23 22:33:58 iceisfunzfs kernel: [276240.693175] [] ? dmu_tx_assign+0xef/0x36e [zfs]
Apr 23 22:33:58 iceisfunzfs kernel: [276240.693189] [] ? zvol_write+0xc3/0x3be [zfs]
Apr 23 22:33:58 iceisfunzfs kernel: [276240.693192] [] ? __wake_up+0x35/0x46
Apr 23 22:33:58 iceisfunzfs kernel: [276240.693196] [] ? taskq_thread+0x290/0x3e6 [spl]
Apr 23 22:33:58 iceisfunzfs kernel: [276240.693199] [] ? try_to_wake_up+0x197/0x197
Apr 23 22:33:58 iceisfunzfs kernel: [276240.693203] [] ? task_done+0x110/0x110 [spl]
Apr 23 22:33:58 iceisfunzfs kernel: [276240.693206] [] ? kthread+0x76/0x7e
Apr 23 22:33:58 iceisfunzfs kernel: [276240.693209] [] ? kernel_thread_helper+0x4/0x10
Apr 23 22:33:58 iceisfunzfs kernel: [276240.693212] [] ? kthread_worker_fn+0x139/0x139
Apr 23 22:33:58 iceisfunzfs kernel: [276240.693214] [] ? gs_change+0x13/0x13
Apr 24 03:10:00 iceisfunzfs kernel: [292802.857945] iscsi_trgt: Logical Unit Reset (05) issued on tid:2 lun:0 by sid:844425852944448 (Function Complete)
Apr 24 03:11:00 iceisfunzfs kernel: [292862.855660] iscsi_trgt: Logical Unit Reset (05) issued on tid:2 lun:0 by sid:844425852944448 (Function Complete)
Apr 24 03:12:00 iceisfunzfs kernel: [292922.853300] iscsi_trgt: Logical Unit Reset (05) issued on tid:2 lun:0 by sid:844425852944448 (Function Complete)
Apr 24 03:13:00 iceisfunzfs kernel: [292982.851643] iscsi_trgt: Logical Unit Reset (05) issued on tid:2 lun:0 by sid:844425852944448 (Function Complete)
Apr 24 03:14:00 iceisfunzfs kernel: [293042.849805] iscsi_trgt: Logical Unit Reset (05) issued on tid:2 lun:0 by sid:844425852944448 (Function Complete)
Apr 24 03:15:00 iceisfunzfs kernel: [293102.847936] iscsi_trgt: Logical Unit Reset (05) issued on tid:2 lun:0 by sid:844425852944448 (Function Complete)
Apr 24 03:16:00 iceisfunzfs kernel: [293162.845029] iscsi_trgt: Logical Unit Reset (05) issued on tid:2 lun:0 by sid:844425852944448 (Function Complete)
Apr 24 03:17:00 iceisfunzfs kernel: [293222.843459] iscsi_trgt: Logical Unit Reset (05) issued on tid:2 lun:0 by sid:844425852944448 (Function Complete)
Apr 24 03:18:00 iceisfunzfs kernel: [293282.841011] iscsi_trgt: Logical Unit Reset (05) issued on tid:2 lun:0 by sid:844425852944448 (Function Complete)
Apr 24 03:19:00 iceisfunzfs kernel: [293342.838926] iscsi_trgt: Logical Unit Reset (05) issued on tid:2 lun:0 by sid:844425852944448 (Function Complete)
Apr 24 03:20:00 iceisfunzfs kernel: [293402.837177] iscsi_trgt: Logical Unit Reset (05) issued on tid:2 lun:0 by sid:844425852944448 (Function Complete)

@iceisfun
Copy link

I was able to get my test machine into a non-responsive state doing the above testing with a zvol, bonnie++ and ext4 - I believe in my case the issue came down to the machine not having enough memory / zfs allowed to use too much memory

@behlendorf
Copy link
Contributor

Does anyone know if this is still an issue with the 0.6.3 version or newer?

@behlendorf behlendorf removed this from the 0.6.4 milestone Dec 20, 2014
@usrflo
Copy link

usrflo commented Mar 1, 2015

I am running Debian Wheezy, zfs* 0.6.3-1~wheezy

Under heavy load the system becomes stalled and has to be rebooted about once every 2 weeks. This time the system came back after an hour and I found the following stacktraces.

Additional information: swap space was configured according to https://github.com/zfsonlinux/pkg-zfs/wiki/HOWTO-use-a-zvol-as-a-swap-device.

# free
             total       used       free     shared    buffers     cached
Mem:      32702260   32364100     338160          0        408      35316
-/+ buffers/cache:   32328376     373884
Swap:     33554428      37220   33517208

[1013751.840885] INFO: task kswapd0:46 blocked for more than 120 seconds.
[1013751.840920] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1013751.840948] kswapd0 D ffff88081fbd2780 0 46 2 0x00000000
[1013751.840950] ffff88081b8b0180 0000000000000046 ffff880400000000 ffff88081b87d1c0
[1013751.840952] 0000000000012780 ffff8807fc699fd8 ffff8807fc699fd8 ffff88081b8b0180
[1013751.840954] ffffffff81036618 000000018134fa74 dead000000100100 ffff8807ce58e210
[1013751.840956] Call Trace:
[1013751.840961] [] ? should_resched+0x5/0x23
[1013751.840964] [] ? __mutex_lock_common.isra.5+0xff/0x164
[1013751.840965] [] ? should_resched+0x5/0x23
[1013751.840967] [] ? mutex_lock+0x1a/0x2d
[1013751.840969] [] ? kfree+0x5b/0x6c
[1013751.840988] [] ? zfs_zinactive+0x4a/0xac [zfs]
[1013751.840997] [] ? zfs_inactive+0x102/0x13b [zfs]
[1013751.841000] [] ? evict+0x96/0x148
[1013751.841001] [] ? dispose_list+0x27/0x31
[1013751.841003] [] ? prune_icache_sb+0x250/0x25f
[1013751.841006] [] ? prune_super+0xdf/0x149
[1013751.841008] [] ? shrink_slab+0x18f/0x24d
[1013751.841009] [] ? balance_pgdat+0x283/0x4b7
[1013751.841011] [] ? kswapd+0x2e0/0x319
[1013751.841013] [] ? add_wait_queue+0x3c/0x3c
[1013751.841015] [] ? balance_pgdat+0x4b7/0x4b7
[1013751.841017] [] ? kthread+0x76/0x7e
[1013751.841019] [] ? kernel_thread_helper+0x4/0x10
[1013751.841021] [] ? kthread_worker_fn+0x139/0x139
[1013751.841022] [] ? gs_change+0x13/0x13
[1013751.841046] INFO: task txg_quiesce:729 blocked for more than 120 seconds.
[1013751.841069] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1013751.841097] txg_quiesce D ffff88081fa12780 0 729 2 0x00000000
[1013751.841099] ffff8807f1467080 0000000000000046 0000000000000000 ffffffff8160d020
[1013751.841101] 0000000000012780 ffff8807ed6edfd8 ffff8807ed6edfd8 ffff8807f1467080
[1013751.841102] 0000000000000202 0000000181350e01 ffffc90013a294e8 ffffc90013a294e0
[1013751.841104] Call Trace:
[1013751.841109] [] ? cv_wait_common+0xdf/0x175 [spl]
[1013751.841111] [] ? paravirt_read_tsc+0x5/0x8
[1013751.841113] [] ? read_tsc+0x5/0x14
[1013751.841115] [] ? add_wait_queue+0x3c/0x3c
[1013751.841116] [] ? should_resched+0x5/0x23
[1013751.841127] [] ? txg_quiesce_thread+0x23f/0x300 [zfs]
[1013751.841136] [] ? txg_sync_thread+0x507/0x507 [zfs]
[1013751.841140] [] ? thread_generic_wrapper+0x6a/0x75 [spl]
[1013751.841143] [] ? __thread_create+0x2bb/0x2bb [spl]
[1013751.841144] [] ? kthread+0x76/0x7e
[1013751.841146] [] ? kernel_thread_helper+0x4/0x10
[1013751.841148] [] ? kthread_worker_fn+0x139/0x139
[1013751.841150] [] ? gs_change+0x13/0x13
[1013871.768734] INFO: task kswapd0:46 blocked for more than 120 seconds.
[1013871.768760] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1013871.768789] kswapd0 D ffff88081fbd2780 0 46 2 0x00000000
[1013871.768791] ffff88081b8b0180 0000000000000046 ffff880400000000 ffff88081b87d1c0
[1013871.768794] 0000000000012780 ffff8807fc699fd8 ffff8807fc699fd8 ffff88081b8b0180
[1013871.768796] ffffffff81036618 000000018134fa74 dead000000100100 ffff8807ce58e210
[1013871.768798] Call Trace:
[1013871.768802] [] ? should_resched+0x5/0x23
[1013871.768805] [] ? __mutex_lock_common.isra.5+0xff/0x164
[1013871.768807] [] ? should_resched+0x5/0x23
[1013871.768809] [] ? mutex_lock+0x1a/0x2d
[1013871.768811] [] ? kfree+0x5b/0x6c
[1013871.768830] [] ? zfs_zinactive+0x4a/0xac [zfs]
[1013871.768839] [] ? zfs_inactive+0x102/0x13b [zfs]
[1013871.768842] [] ? evict+0x96/0x148
[1013871.768843] [] ? dispose_list+0x27/0x31
[1013871.768845] [] ? prune_icache_sb+0x250/0x25f
[1013871.768847] [] ? prune_super+0xdf/0x149
[1013871.768850] [] ? shrink_slab+0x18f/0x24d
[1013871.768851] [] ? balance_pgdat+0x283/0x4b7
[1013871.768853] [] ? kswapd+0x2e0/0x319
[1013871.768856] [] ? add_wait_queue+0x3c/0x3c
[1013871.768857] [] ? balance_pgdat+0x4b7/0x4b7
[1013871.768859] [] ? kthread+0x76/0x7e
[1013871.768861] [] ? kernel_thread_helper+0x4/0x10
[1013871.768863] [] ? kthread_worker_fn+0x139/0x139
[1013871.768865] [] ? gs_change+0x13/0x13
[1013871.768888] INFO: task txg_quiesce:729 blocked for more than 120 seconds.
[1013871.768912] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1013871.768940] txg_quiesce D ffff88081fa12780 0 729 2 0x00000000
[1013871.768942] ffff8807f1467080 0000000000000046 0000000000000000 ffffffff8160d020
[1013871.768944] 0000000000012780 ffff8807ed6edfd8 ffff8807ed6edfd8 ffff8807f1467080
[1013871.768946] 0000000000000202 0000000181350e01 ffffc90013a294e8 ffffc90013a294e0
[1013871.768948] Call Trace:
[1013871.768953] [] ? cv_wait_common+0xdf/0x175 [spl]
[1013871.768955] [] ? paravirt_read_tsc+0x5/0x8
[1013871.768957] [] ? read_tsc+0x5/0x14
[1013871.768958] [] ? add_wait_queue+0x3c/0x3c
[1013871.768960] [] ? should_resched+0x5/0x23
[1013871.768971] [] ? txg_quiesce_thread+0x23f/0x300 [zfs]
[1013871.768980] [] ? txg_sync_thread+0x507/0x507 [zfs]
[1013871.768984] [] ? thread_generic_wrapper+0x6a/0x75 [spl]
[1013871.768987] [] ? __thread_create+0x2bb/0x2bb [spl]
[1013871.768989] [] ? kthread+0x76/0x7e
[1013871.768991] [] ? kernel_thread_helper+0x4/0x10
[1013871.768993] [] ? kthread_worker_fn+0x139/0x139
[1013871.768994] [] ? gs_change+0x13/0x13
[1013871.769015] INFO: task java:9935 blocked for more than 120 seconds.
[1013871.769038] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1013871.769066] java D ffff88081fa52780 0 9935 1 0x00000000
[1013871.769067] ffff8806052fa040 0000000000000082 ffffffff00000000 ffff88081bf9a180
[1013871.769069] 0000000000012780 ffff8806298fdfd8 ffff8806298fdfd8 ffff8806052fa040
[1013871.769071] 0000000000000000 00000001ffffffff ffffffff817ad548 ffff8807fb59fbe0
[1013871.769073] Call Trace:
[1013871.769075] [] ? rwsem_down_failed_common+0xe0/0x114
[1013871.769078] [] ? call_rwsem_down_read_failed+0x14/0x30
[1013871.769080] [] ? down_read+0x17/0x19
[1013871.769081] [] ? do_page_fault+0x1c3/0x345
[1013871.769083] [] ? sys_futex+0x120/0x151
[1013871.769085] [] ? page_fault+0x25/0x30
[1013871.769087] INFO: task java:9942 blocked for more than 120 seconds.
[1013871.769109] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1013871.769137] java D ffff88081fad2780 0 9942 1 0x00000000
[1013871.769139] ffff8807fbeb81c0 0000000000000082 ffffffff00000000 ffff88081bfd5100
[1013871.769141] 0000000000012780 ffff8801a5733fd8 ffff8801a5733fd8 ffff8807fbeb81c0
[1013871.769142] 0000000000000000 00000001ffffffff ffffffff817ace10 ffff8807fb59fbe0
[1013871.769144] Call Trace:
[1013871.769146] [] ? rwsem_down_failed_common+0xe0/0x114
[1013871.769149] [] ? futex_wake+0xe9/0xfb
[1013871.769150] [] ? call_rwsem_down_read_failed+0x14/0x30
[1013871.769152] [] ? down_read+0x17/0x19
[1013871.769153] [] ? do_page_fault+0x1c3/0x345
[1013871.769155] [] ? sys_futex+0x120/0x151
[1013871.769157] [] ? page_fault+0x25/0x30
[1013871.769158] INFO: task java:9944 blocked for more than 120 seconds.
[1013871.769180] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1013871.769208] java D ffff88081fa12780 0 9944 1 0x00000000
[1013871.769210] ffff880026938140 0000000000000082 0000000000000000 ffffffff8160d020
[1013871.769211] 0000000000012780 ffff88024449dfd8 ffff88024449dfd8 ffff880026938140
[1013871.769213] 0000000000000000 00000001ffffffff ffffffff817acc78 ffff8807fb59fbe0
[1013871.769215] Call Trace:
[1013871.769217] [] ? rwsem_down_failed_common+0xe0/0x114
[1013871.769218] [] ? futex_wake+0xe9/0xfb
[1013871.769220] [] ? call_rwsem_down_read_failed+0x14/0x30
[1013871.769222] [] ? down_read+0x17/0x19
[1013871.769223] [] ? do_page_fault+0x1c3/0x345
[1013871.769225] [] ? wake_up_new_task+0xb9/0xc2
[1013871.769226] [] ? sys_futex+0x120/0x151
[1013871.769228] [] ? fget_light+0x67/0x7b
[1013871.769230] [] ? page_fault+0x25/0x30
[1013871.769231] INFO: task java:21122 blocked for more than 120 seconds.
[1013871.769254] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1013871.769281] java D ffff88081fb92780 0 21122 1 0x00000000
[1013871.769283] ffff880348745840 0000000000000082 ffff8806f5055ef0 ffff88081b844800
[1013871.769285] 0000000000012780 ffff880608a33fd8 ffff880608a33fd8 ffff880348745840
[1013871.769287] 00007fe07bcb2110 0000000000000014 ffff8807e0e091e8 ffff8807fb59fbe0
[1013871.769288] Call Trace:
[1013871.769290] [] ? rwsem_down_failed_common+0xe0/0x114
[1013871.769292] [] ? call_rwsem_down_write_failed+0x13/0x20
[1013871.769293] [] ? down_write+0x25/0x27
[1013871.769296] [] ? sys_mmap_pgoff+0xf0/0x160
[1013871.769297] [] ? system_call_fastpath+0x16/0x1b
[1013871.769299] INFO: task java:24556 blocked for more than 120 seconds.
[1013871.769322] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1013871.769350] java D ffff88081fa92780 0 24556 1 0x00000000
[1013871.769351] ffff88064aefc100 0000000000000082 ffffffff817ad728 ffff88081bfc5780
[1013871.769353] 0000000000012780 ffff88033e57dfd8 ffff88033e57dfd8 ffff88064aefc100
[1013871.769355] 0000000000000000 00000000ffffffff ffffffff817ad728 ffff8807fb59fbe0
[1013871.769357] Call Trace:
[1013871.769358] [] ? rwsem_down_failed_common+0xe0/0x114
[1013871.769360] [] ? call_rwsem_down_read_failed+0x14/0x30
[1013871.769362] [] ? down_read+0x17/0x19
[1013871.769363] [] ? do_page_fault+0x1c3/0x345
[1013871.769364] [] ? sys_futex+0x120/0x151
[1013871.769366] [] ? page_fault+0x25/0x30
[1013871.769367] INFO: task java:31030 blocked for more than 120 seconds.
[1013871.769390] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1013871.769417] java D ffff88081fa92780 0 31030 1 0x00000000
[1013871.769419] ffff88021c248800 0000000000000082 ffffffff00000000 ffff88081bfc5780
[1013871.769421] 0000000000012780 ffff8806492c5fd8 ffff8806492c5fd8 ffff88021c248800
[1013871.769423] 0000000000000000 00000001ffffffff ffffffff817aca20 ffff8807fb59fbe0
[1013871.769425] Call Trace:
[1013871.769426] [] ? rwsem_down_failed_common+0xe0/0x114
[1013871.769428] [] ? call_rwsem_down_read_failed+0x14/0x30
[1013871.769430] [] ? down_read+0x17/0x19
[1013871.769431] [] ? do_page_fault+0x1c3/0x345
[1013871.769432] [] ? sys_futex+0x120/0x151
[1013871.769434] [] ? page_fault+0x25/0x30

@kernelOfTruth
Copy link
Contributor

FYI:

Debian Wheezy uses linux-3.2.41 according to
http://distrowatch.com/table.php?distribution=debian
if that's correct - unless @usrflo uses a more recent kernel

It appears that there's a trend with "stable" distributions that run with 2.6* or 3.0_, 3.2_ kernels which suffer from this

@usrflo
Copy link

usrflo commented Mar 1, 2015

FYI:

# uname -a
Linux hostxx 3.2.0-4-amd64 #1 SMP Debian 3.2.65-1+deb7u1 x86_64 GNU/Linux

@usrflo
Copy link

usrflo commented Mar 29, 2015

I updated to a kernel 3.16 last week:
# uname -a
Linux host30 3.16.0-0.bpo.4-amd64 #1 SMP Debian 3.16.7-ckt4-3~bpo70+1 (2015-02-12) x86_64 GNU/Linux

This night I experienced the same problem again with the new kernel.

Some hints from my monitoring:

  • the lock happens either in a ~7 day or ~14 day interval (may be after 6 days as well but I never saw e.g. a 2, 3, 4 day or 9, 10, 11 day interval)
  • it always happens at around 1:00 am GMT
  • there are no cronjobs related to this time of day
  • the server is used as a backup server (only), the locks happen on heavy I/O usage (in the daily monitoring there is a peak at 1:00 am GMT)
  • the backups are done by rsync so there are no days with substantially more backup data (i.e. there is no weekly full backup I could relate the locks every ~7 or ~14 days to)

Do you have any idea about the time correlation of these locks?

@kernelOfTruth
Copy link
Contributor

@usrflo I'd suspect that it's rather high i/o load and ARC/memory pressure that leads to (coincidentally the same timepoint) when the system has issues

If you're running zfs/master & spl/master
could you give the patch(set) referenced at #3235 (comment) a try ?
also please take a look at the related pull-request: #3225

also could you - in either case - post the data requested at #3235 (comment) ?

Is your configuration and system software (kernel, rsync, etc.) all stock ?

if not is using a patched rsync (to lower memory impact) an (temporary) option ?
http://insights.oetiker.ch/linux/fadvise/
https://tobi.oetiker.ch/patches/

that would add the

--drop-cache

option to rsync

can you run a kernel with

mm, vmscan: prevent kswapd livelock due to pfmemalloc-throttled process being killed

http://patchwork.ozlabs.org/patch/433475/

is CONFIG_PARAVIRT_SPINLOCKS and paravirt support an option ?
it could help in some corner cases: #3091

@iceisfun
Copy link

The condition can be induced by high stress/io load situations, I do not
think there is any regular schedule to it like you suggest.

On Sun, Mar 29, 2015 at 5:46 AM, kernelOfTruth aka. kOT, Gentoo user <
[email protected]> wrote:

@usrflo https://github.com/usrflo I'd suspect that it's rather high i/o
load and ARC/memory pressure that leads to (coincidentally the same
timepoint) when the system has issues

could you give the patch(set) referenced at #3235 (comment)
#3235 (comment) a
try ?


Reply to this email directly or view it on GitHub
#2272 (comment).

@usrflo
Copy link

usrflo commented Apr 19, 2015

@kernelOfTruth : thanks for so many proposals. I hesitate to test the patchset (#3235) on this productive system.
7 days ago I installed the workaround by patching rsync with the proposed --drop-cache option and didn't experience the problem anymore with drop-cache ✨.

If helpful for debugging I can give some more information about my system?

@usrflo
Copy link

usrflo commented May 29, 2015

Update: even with drop-cache the problem happened once again after the 19th of April with v0.6.3.
In between I switched to v0.6.4 and I didn't experience the problem again, the system is up for 26 days now.

@behlendorf
Copy link
Contributor

This should be resolved in 0.6.5.x.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants
@behlendorf @albertyann @iceisfun @usrflo @kernelOfTruth and others