Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spl_system_task blocked for more than 120s on debian 8 and zfs 0.6.4-1.1-1 #3402

Closed
Pentium100MHz opened this issue May 12, 2015 · 6 comments

Comments

@Pentium100MHz
Copy link

Hi, I noticed this in "dmesg" of a hypervisor (we are using zvols for kvm virtual machine storage). The operation of the server does not seem to be affected, however, it may be worth looking into.

[1005043.501920] INFO: task spl_system_task:651 blocked for more than 120 seconds.
[1005043.501971]       Tainted: P           O  3.16.0-4-amd64 #1
[1005043.501998] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1005043.502042] spl_system_task D ffff8800bb0faff8     0   651      2 0x00000000
[1005043.502047]  ffff8800bb0faba0 0000000000000046 0000000000012f00 ffff880fff54bfd8
[1005043.502049]  0000000000012f00 ffff8800bb0faba0 ffff8804dbc4eeb8 ffff8804dbc4ee80
[1005043.502052]  ffff8804dbc4eec0 0000000000000000 ffff8804dbc4eeb8 ffff8806cda73d20
[1005043.502055] Call Trace:
[1005043.502075]  [<ffffffffa038364d>] ? cv_wait_common+0xcd/0x100 [spl]
[1005043.502081]  [<ffffffff810a7930>] ? prepare_to_wait_event+0xf0/0xf0
[1005043.502103]  [<ffffffffa06da05b>] ? traverse_prefetcher+0x9b/0x170 [zfs]
[1005043.502115]  [<ffffffffa06da5b7>] ? traverse_visitbp+0x487/0x840 [zfs]
[1005043.502128]  [<ffffffffa06da432>] ? traverse_visitbp+0x302/0x840 [zfs]
[1005043.502140]  [<ffffffffa06da432>] ? traverse_visitbp+0x302/0x840 [zfs]
[1005043.502152]  [<ffffffffa06da432>] ? traverse_visitbp+0x302/0x840 [zfs]
[1005043.502164]  [<ffffffffa06da432>] ? traverse_visitbp+0x302/0x840 [zfs]
[1005043.502177]  [<ffffffffa06db031>] ? traverse_dnode+0x71/0xe0 [zfs]
[1005043.502189]  [<ffffffffa06da7a2>] ? traverse_visitbp+0x672/0x840 [zfs]
[1005043.502201]  [<ffffffffa06da432>] ? traverse_visitbp+0x302/0x840 [zfs]
[1005043.502213]  [<ffffffffa06da432>] ? traverse_visitbp+0x302/0x840 [zfs]
[1005043.502225]  [<ffffffffa06da432>] ? traverse_visitbp+0x302/0x840 [zfs]
[1005043.502237]  [<ffffffffa06da432>] ? traverse_visitbp+0x302/0x840 [zfs]
[1005043.502249]  [<ffffffffa06da432>] ? traverse_visitbp+0x302/0x840 [zfs]
[1005043.502261]  [<ffffffffa06da432>] ? traverse_visitbp+0x302/0x840 [zfs]
[1005043.502274]  [<ffffffffa06db031>] ? traverse_dnode+0x71/0xe0 [zfs]
[1005043.502286]  [<ffffffffa06da88b>] ? traverse_visitbp+0x75b/0x840 [zfs]
[1005043.502298]  [<ffffffffa06daf54>] ? traverse_prefetch_thread+0x94/0x100 [zfs]
[1005043.502310]  [<ffffffffa06d9fc0>] ? prefetch_needed.isra.3+0x40/0x40 [zfs]
[1005043.502315]  [<ffffffffa037ff81>] ? taskq_thread+0x1a1/0x340 [spl]
[1005043.502318]  [<ffffffff81096bd0>] ? wake_up_state+0x10/0x10
[1005043.502322]  [<ffffffffa037fde0>] ? taskq_cancel_id+0x110/0x110 [spl]
[1005043.502326]  [<ffffffff81087edd>] ? kthread+0xbd/0xe0
[1005043.502328]  [<ffffffff81087e20>] ? kthread_create_on_node+0x180/0x180
[1005043.502332]  [<ffffffff81510d98>] ? ret_from_fork+0x58/0x90
[1005043.502334]  [<ffffffff81087e20>] ? kthread_create_on_node+0x180/0x180

This is on debian 8.0, with zfs package version 0.6.4-1.1-1

@behlendorf
Copy link
Contributor

Yes, the warning here is just advisory. But thanks for posting it so we're aware of it.

@odoucet
Copy link

odoucet commented Jun 8, 2015

Got similar dmesg message (same ZFS version, CentoS 6.5), while doing send/recv

 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
 spl_system_task D ffffffff81810d00     0  2944      2 0x00000000
 ffff883001f0d708 0000000000000046 ffff883001f0dfd8 0000000000013140
 ffff883001f0c010 0000000000013140 0000000000013140 0000000000013140
 ffff883001f0dfd8 0000000000013140 ffff88300348e100 ffff8818069a8fa0
 Call Trace:
 [<ffffffff815eefa9>] schedule+0x29/0x70
 [<ffffffffa027c1a5>] cv_wait_common+0x115/0x130 [spl]
 [<ffffffff810829a0>] ? wake_up_bit+0x40/0x40
 [<ffffffffa027c215>] __cv_wait+0x15/0x20 [spl]
 [<ffffffffa02f39a2>] traverse_prefetcher+0xf2/0x1c0 [zfs]
 [<ffffffffa02d214c>] ? arc_buf_remove_ref+0x11c/0x170 [zfs]
 [<ffffffffa02f2d96>] traverse_visitbp+0x4a6/0x810 [zfs]
 [<ffffffffa02f2d0b>] traverse_visitbp+0x41b/0x810 [zfs]
 [<ffffffffa02f2d0b>] traverse_visitbp+0x41b/0x810 [zfs]
 [<ffffffffa02f2d0b>] traverse_visitbp+0x41b/0x810 [zfs]
 [<ffffffffa02f2d0b>] traverse_visitbp+0x41b/0x810 [zfs]
 [<ffffffffa02f2d0b>] traverse_visitbp+0x41b/0x810 [zfs]
 [<ffffffffa02f2d0b>] traverse_visitbp+0x41b/0x810 [zfs]
 [<ffffffffa02f3171>] traverse_dnode+0x71/0xd0 [zfs]
 [<ffffffffa02f3020>] traverse_visitbp+0x730/0x810 [zfs]
 [<ffffffffa02f383b>] traverse_prefetch_thread+0x8b/0x100 [zfs]
 [<ffffffffa02f38b0>] ? traverse_prefetch_thread+0x100/0x100 [zfs]
 [<ffffffffa0278132>] taskq_thread+0x1f2/0x400 [spl]
 [<ffffffff81093c40>] ? try_to_wake_up+0x2c0/0x2c0
 [<ffffffffa0277f40>] ? task_expire+0x120/0x120 [spl]
 [<ffffffff8108218e>] kthread+0xce/0xe0
 [<ffffffff810820c0>] ? kthread_freezable_should_stop+0x70/0x70
 [<ffffffff815f9508>] ret_from_fork+0x58/0x90
 [<ffffffff810820c0>] ? kthread_freezable_should_stop+0x70/0x70

@eolson78
Copy link

eolson78 commented Jun 8, 2015

this looks similar to the issue I am seeing in #3450

@behlendorf
Copy link
Contributor

It sure does, thanks for cross-referencing these issues.

@odoucet
Copy link

odoucet commented Jun 10, 2015

This is related to pull request #3482 ?

@behlendorf
Copy link
Contributor

@odoucet yes. That patch should resolve the warning.

behlendorf added a commit to behlendorf/zfs that referenced this issue Jun 11, 2015
This is the counterpart to openzfs/spl@2345368 which replaces the
cv_wait_interruptible() function with cv_wait_sig().  There is no
functional change to patch merely brings the function names in to
sync to maximize portability.

Signed-off-by: Brian Behlendorf <[email protected]>
Issue openzfs#3450
Issue openzfs#3402
behlendorf added a commit to behlendorf/zfs that referenced this issue Jun 11, 2015
The Linux kernel watchdog will automatically dump a backtrace for
any process while sleeps for over 120s in an uninterruptible state.

The solution is for the prefetch thread to sleep in an interruptible
state.  The way the existing code was written this is safe because
when woken it will always reevaluate its conditional.  As a general
rule it is preferable to sleep in an interruptible when possible.

Signed-off-by: Brian Behlendorf <[email protected]>
Issue openzfs#3450
Issue openzfs#3402
kernelOfTruth pushed a commit to kernelOfTruth/zfs that referenced this issue Jun 14, 2015
This is the counterpart to openzfs/spl@2345368 which replaces the
cv_wait_interruptible() function with cv_wait_sig().  There is no
functional change to patch merely brings the function names in to
sync to maximize portability.

Signed-off-by: Brian Behlendorf <[email protected]>
Issue openzfs#3450
Issue openzfs#3402
kernelOfTruth pushed a commit to kernelOfTruth/zfs that referenced this issue Jun 20, 2015
The Linux kernel watchdog will automatically dump a backtrace for
any process while sleeps for over 120s in an uninterruptible state.

The solution is for the prefetch thread to sleep in an interruptible
state.  The way the existing code was written this is safe because
when woken it will always reevaluate its conditional.  As a general
rule it is preferable to sleep in an interruptible when possible.

Signed-off-by: Brian Behlendorf <[email protected]>
Closes openzfs#3450
Closes openzfs#3402
behlendorf added a commit that referenced this issue Jun 24, 2015
The Linux kernel watchdog will automatically dump a backtrace for
any process while sleeps for over 120s in an uninterruptible state.

The solution is for the prefetch thread to sleep in an interruptible
state.  The way the existing code was written this is safe because
when woken it will always reevaluate its conditional.  As a general
rule it is preferable to sleep in an interruptible when possible.

Signed-off-by: Brian Behlendorf <[email protected]>
Closes #3450
Closes #3402
MorpheusTeam pushed a commit to Xyratex/lustre-stable that referenced this issue Aug 10, 2015
Updates ZFS and SPL to latest maintence version.  Includes the
following:

Bug Fixes:
* Fix panic due to corrupt nvlist when running utilities
(openzfs/zfs#3335)
* Fix hard lockup due to infinite loop in zfs_zget()
(openzfs/zfs#3349)
* Fix panic on unmount due to iput taskq (openzfs/zfs#3281)
* Improve metadata shrinker performance on pre-3.1 kernels
(openzfs/zfs#3501)
* Linux 4.1 compat: use read_iter() / write_iter()
* Linux 3.12 compat: NUMA-aware per-superblock shrinker
* Fix spurious hung task watchdog stack traces (openzfs/zfs#3402)
* Fix module loading in zfs import systemd service
(openzfs/zfs#3440)
* Fix intermittent libzfs_init() failure to open /dev/zfs
(openzfs/zfs#2556)

Signed-off-by: Nathaniel Clark <[email protected]>
Change-Id: I053087317ff9e5bedc1671bb46062e96bfe6f074
Reviewed-on: http://review.whamcloud.com/15481
Reviewed-by: Alex Zhuravlev <[email protected]>
Tested-by: Jenkins
Reviewed-by: Isaac Huang <[email protected]>
Tested-by: Maloo <[email protected]>
Reviewed-by: Oleg Drokin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants