spl_system_task blocked for more than 120s on debian 8 and zfs 0.6.4-1.1-1 #3402

Pentium100MHz · 2015-05-12T13:41:19Z

Hi, I noticed this in "dmesg" of a hypervisor (we are using zvols for kvm virtual machine storage). The operation of the server does not seem to be affected, however, it may be worth looking into.

[1005043.501920] INFO: task spl_system_task:651 blocked for more than 120 seconds.
[1005043.501971]       Tainted: P           O  3.16.0-4-amd64 #1
[1005043.501998] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1005043.502042] spl_system_task D ffff8800bb0faff8     0   651      2 0x00000000
[1005043.502047]  ffff8800bb0faba0 0000000000000046 0000000000012f00 ffff880fff54bfd8
[1005043.502049]  0000000000012f00 ffff8800bb0faba0 ffff8804dbc4eeb8 ffff8804dbc4ee80
[1005043.502052]  ffff8804dbc4eec0 0000000000000000 ffff8804dbc4eeb8 ffff8806cda73d20
[1005043.502055] Call Trace:
[1005043.502075]  [<ffffffffa038364d>] ? cv_wait_common+0xcd/0x100 [spl]
[1005043.502081]  [<ffffffff810a7930>] ? prepare_to_wait_event+0xf0/0xf0
[1005043.502103]  [<ffffffffa06da05b>] ? traverse_prefetcher+0x9b/0x170 [zfs]
[1005043.502115]  [<ffffffffa06da5b7>] ? traverse_visitbp+0x487/0x840 [zfs]
[1005043.502128]  [<ffffffffa06da432>] ? traverse_visitbp+0x302/0x840 [zfs]
[1005043.502140]  [<ffffffffa06da432>] ? traverse_visitbp+0x302/0x840 [zfs]
[1005043.502152]  [<ffffffffa06da432>] ? traverse_visitbp+0x302/0x840 [zfs]
[1005043.502164]  [<ffffffffa06da432>] ? traverse_visitbp+0x302/0x840 [zfs]
[1005043.502177]  [<ffffffffa06db031>] ? traverse_dnode+0x71/0xe0 [zfs]
[1005043.502189]  [<ffffffffa06da7a2>] ? traverse_visitbp+0x672/0x840 [zfs]
[1005043.502201]  [<ffffffffa06da432>] ? traverse_visitbp+0x302/0x840 [zfs]
[1005043.502213]  [<ffffffffa06da432>] ? traverse_visitbp+0x302/0x840 [zfs]
[1005043.502225]  [<ffffffffa06da432>] ? traverse_visitbp+0x302/0x840 [zfs]
[1005043.502237]  [<ffffffffa06da432>] ? traverse_visitbp+0x302/0x840 [zfs]
[1005043.502249]  [<ffffffffa06da432>] ? traverse_visitbp+0x302/0x840 [zfs]
[1005043.502261]  [<ffffffffa06da432>] ? traverse_visitbp+0x302/0x840 [zfs]
[1005043.502274]  [<ffffffffa06db031>] ? traverse_dnode+0x71/0xe0 [zfs]
[1005043.502286]  [<ffffffffa06da88b>] ? traverse_visitbp+0x75b/0x840 [zfs]
[1005043.502298]  [<ffffffffa06daf54>] ? traverse_prefetch_thread+0x94/0x100 [zfs]
[1005043.502310]  [<ffffffffa06d9fc0>] ? prefetch_needed.isra.3+0x40/0x40 [zfs]
[1005043.502315]  [<ffffffffa037ff81>] ? taskq_thread+0x1a1/0x340 [spl]
[1005043.502318]  [<ffffffff81096bd0>] ? wake_up_state+0x10/0x10
[1005043.502322]  [<ffffffffa037fde0>] ? taskq_cancel_id+0x110/0x110 [spl]
[1005043.502326]  [<ffffffff81087edd>] ? kthread+0xbd/0xe0
[1005043.502328]  [<ffffffff81087e20>] ? kthread_create_on_node+0x180/0x180
[1005043.502332]  [<ffffffff81510d98>] ? ret_from_fork+0x58/0x90
[1005043.502334]  [<ffffffff81087e20>] ? kthread_create_on_node+0x180/0x180

This is on debian 8.0, with zfs package version 0.6.4-1.1-1

The text was updated successfully, but these errors were encountered:

behlendorf · 2015-05-12T20:39:58Z

Yes, the warning here is just advisory. But thanks for posting it so we're aware of it.

odoucet · 2015-06-08T17:42:34Z

Got similar dmesg message (same ZFS version, CentoS 6.5), while doing send/recv

 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
 spl_system_task D ffffffff81810d00     0  2944      2 0x00000000
 ffff883001f0d708 0000000000000046 ffff883001f0dfd8 0000000000013140
 ffff883001f0c010 0000000000013140 0000000000013140 0000000000013140
 ffff883001f0dfd8 0000000000013140 ffff88300348e100 ffff8818069a8fa0
 Call Trace:
 [<ffffffff815eefa9>] schedule+0x29/0x70
 [<ffffffffa027c1a5>] cv_wait_common+0x115/0x130 [spl]
 [<ffffffff810829a0>] ? wake_up_bit+0x40/0x40
 [<ffffffffa027c215>] __cv_wait+0x15/0x20 [spl]
 [<ffffffffa02f39a2>] traverse_prefetcher+0xf2/0x1c0 [zfs]
 [<ffffffffa02d214c>] ? arc_buf_remove_ref+0x11c/0x170 [zfs]
 [<ffffffffa02f2d96>] traverse_visitbp+0x4a6/0x810 [zfs]
 [<ffffffffa02f2d0b>] traverse_visitbp+0x41b/0x810 [zfs]
 [<ffffffffa02f2d0b>] traverse_visitbp+0x41b/0x810 [zfs]
 [<ffffffffa02f2d0b>] traverse_visitbp+0x41b/0x810 [zfs]
 [<ffffffffa02f2d0b>] traverse_visitbp+0x41b/0x810 [zfs]
 [<ffffffffa02f2d0b>] traverse_visitbp+0x41b/0x810 [zfs]
 [<ffffffffa02f2d0b>] traverse_visitbp+0x41b/0x810 [zfs]
 [<ffffffffa02f3171>] traverse_dnode+0x71/0xd0 [zfs]
 [<ffffffffa02f3020>] traverse_visitbp+0x730/0x810 [zfs]
 [<ffffffffa02f383b>] traverse_prefetch_thread+0x8b/0x100 [zfs]
 [<ffffffffa02f38b0>] ? traverse_prefetch_thread+0x100/0x100 [zfs]
 [<ffffffffa0278132>] taskq_thread+0x1f2/0x400 [spl]
 [<ffffffff81093c40>] ? try_to_wake_up+0x2c0/0x2c0
 [<ffffffffa0277f40>] ? task_expire+0x120/0x120 [spl]
 [<ffffffff8108218e>] kthread+0xce/0xe0
 [<ffffffff810820c0>] ? kthread_freezable_should_stop+0x70/0x70
 [<ffffffff815f9508>] ret_from_fork+0x58/0x90
 [<ffffffff810820c0>] ? kthread_freezable_should_stop+0x70/0x70

eolson78 · 2015-06-08T18:23:19Z

this looks similar to the issue I am seeing in #3450

behlendorf · 2015-06-09T22:00:18Z

It sure does, thanks for cross-referencing these issues.

odoucet · 2015-06-10T08:54:28Z

This is related to pull request #3482 ?

behlendorf · 2015-06-10T14:35:53Z

@odoucet yes. That patch should resolve the warning.

This is the counterpart to openzfs/spl@2345368 which replaces the cv_wait_interruptible() function with cv_wait_sig(). There is no functional change to patch merely brings the function names in to sync to maximize portability. Signed-off-by: Brian Behlendorf <[email protected]> Issue openzfs#3450 Issue openzfs#3402

The Linux kernel watchdog will automatically dump a backtrace for any process while sleeps for over 120s in an uninterruptible state. The solution is for the prefetch thread to sleep in an interruptible state. The way the existing code was written this is safe because when woken it will always reevaluate its conditional. As a general rule it is preferable to sleep in an interruptible when possible. Signed-off-by: Brian Behlendorf <[email protected]> Issue openzfs#3450 Issue openzfs#3402

This is the counterpart to openzfs/spl@2345368 which replaces the cv_wait_interruptible() function with cv_wait_sig(). There is no functional change to patch merely brings the function names in to sync to maximize portability. Signed-off-by: Brian Behlendorf <[email protected]> Issue openzfs#3450 Issue openzfs#3402

The Linux kernel watchdog will automatically dump a backtrace for any process while sleeps for over 120s in an uninterruptible state. The solution is for the prefetch thread to sleep in an interruptible state. The way the existing code was written this is safe because when woken it will always reevaluate its conditional. As a general rule it is preferable to sleep in an interruptible when possible. Signed-off-by: Brian Behlendorf <[email protected]> Closes openzfs#3450 Closes openzfs#3402

The Linux kernel watchdog will automatically dump a backtrace for any process while sleeps for over 120s in an uninterruptible state. The solution is for the prefetch thread to sleep in an interruptible state. The way the existing code was written this is safe because when woken it will always reevaluate its conditional. As a general rule it is preferable to sleep in an interruptible when possible. Signed-off-by: Brian Behlendorf <[email protected]> Closes #3450 Closes #3402

Updates ZFS and SPL to latest maintence version. Includes the following: Bug Fixes: * Fix panic due to corrupt nvlist when running utilities (openzfs/zfs#3335) * Fix hard lockup due to infinite loop in zfs_zget() (openzfs/zfs#3349) * Fix panic on unmount due to iput taskq (openzfs/zfs#3281) * Improve metadata shrinker performance on pre-3.1 kernels (openzfs/zfs#3501) * Linux 4.1 compat: use read_iter() / write_iter() * Linux 3.12 compat: NUMA-aware per-superblock shrinker * Fix spurious hung task watchdog stack traces (openzfs/zfs#3402) * Fix module loading in zfs import systemd service (openzfs/zfs#3440) * Fix intermittent libzfs_init() failure to open /dev/zfs (openzfs/zfs#2556) Signed-off-by: Nathaniel Clark <[email protected]> Change-Id: I053087317ff9e5bedc1671bb46062e96bfe6f074 Reviewed-on: http://review.whamcloud.com/15481 Reviewed-by: Alex Zhuravlev <[email protected]> Tested-by: Jenkins Reviewed-by: Isaac Huang <[email protected]> Tested-by: Maloo <[email protected]> Reviewed-by: Oleg Drokin <[email protected]>

behlendorf closed this as completed in 8e70975 Jun 16, 2015

galindro mentioned this issue Jun 23, 2015

ZFS send sometimes increases sender's load average #3518

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

spl_system_task blocked for more than 120s on debian 8 and zfs 0.6.4-1.1-1 #3402

spl_system_task blocked for more than 120s on debian 8 and zfs 0.6.4-1.1-1 #3402

Pentium100MHz commented May 12, 2015

behlendorf commented May 12, 2015

odoucet commented Jun 8, 2015

eolson78 commented Jun 8, 2015

behlendorf commented Jun 9, 2015

odoucet commented Jun 10, 2015

behlendorf commented Jun 10, 2015

spl_system_task blocked for more than 120s on debian 8 and zfs 0.6.4-1.1-1 #3402

spl_system_task blocked for more than 120s on debian 8 and zfs 0.6.4-1.1-1 #3402

Comments

Pentium100MHz commented May 12, 2015

behlendorf commented May 12, 2015

odoucet commented Jun 8, 2015

eolson78 commented Jun 8, 2015

behlendorf commented Jun 9, 2015

odoucet commented Jun 10, 2015

behlendorf commented Jun 10, 2015