-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stuck on CV_wait in dmu_traverse.c #3450
Comments
@behlendorf that commit is included in my current ZFS. I just confirmed that we had cherry picked that out at the recommendation of @dweeezil Some further info I was able to attach GDB to the machine 0x7ebd is in cv_wait_common (/var/lib/dkms/spl/0.6.4.1/build/module/spl/../../module/spl/spl-condvar.c:106). The above was my output |
@behlendorf just to confirm I went ahead and created another system and I confirmed 100% that the above commit was cherry picked into a 0.6.4.1 build I am seeing the same issue. See below. Jun 6 12:25:49 ip-50-0-0-7 kernel: INFO: task spl_system_task:1893 blocked for more than 120 seconds. |
Will do. I will start another test and report back. On Jun 9, 2015, at 5:59 PM, "Brian Behlendorf" <[email protected]mailto:[email protected]> wrote: @eolson78https://github.com/eolson78 could you try reverting commit b738bc5b738bc5. This was a recent change which touched the prefetch code your having trouble with and potentially introduced this regression. It would be helpful if we could identify it's responsible. Reply to this email directly or view it on GitHubhttps://github.com//issues/3450#issuecomment-110517019. |
@eolson78 or better yet try #3482. The issue here is entirely benign, the kernel just doesn't like the prefetch thread sleeping in an uninterruptible state for over 2 minutes. Since the code is already written to safely handle a signal the patch updates the code to sleep in an interruptible state which should appease the watchdog. As an aside I just learned illumos does have an interface for this called |
This is the counterpart to openzfs/spl@2345368 which replaces the cv_wait_interruptible() function with cv_wait_sig(). There is no functional change to patch merely brings the function names in to sync to maximize portability. Signed-off-by: Brian Behlendorf <[email protected]> Issue openzfs#3450 Issue openzfs#3402
The Linux kernel watchdog will automatically dump a backtrace for any process while sleeps for over 120s in an uninterruptible state. The solution is for the prefetch thread to sleep in an interruptible state. The way the existing code was written this is safe because when woken it will always reevaluate its conditional. As a general rule it is preferable to sleep in an interruptible when possible. Signed-off-by: Brian Behlendorf <[email protected]> Issue openzfs#3450 Issue openzfs#3402
This is the counterpart to openzfs/spl@2345368 which replaces the cv_wait_interruptible() function with cv_wait_sig(). There is no functional change to patch merely brings the function names in to sync to maximize portability. Signed-off-by: Brian Behlendorf <[email protected]> Issue openzfs#3450 Issue openzfs#3402
The Linux kernel watchdog will automatically dump a backtrace for any process while sleeps for over 120s in an uninterruptible state. The solution is for the prefetch thread to sleep in an interruptible state. The way the existing code was written this is safe because when woken it will always reevaluate its conditional. As a general rule it is preferable to sleep in an interruptible when possible. Signed-off-by: Brian Behlendorf <[email protected]> Closes openzfs#3450 Closes openzfs#3402
I can confirm that the this commit 8e70975 does in fact solve my issue. @behlendorf thank you a ton |
The Linux kernel watchdog will automatically dump a backtrace for any process while sleeps for over 120s in an uninterruptible state. The solution is for the prefetch thread to sleep in an interruptible state. The way the existing code was written this is safe because when woken it will always reevaluate its conditional. As a general rule it is preferable to sleep in an interruptible when possible. Signed-off-by: Brian Behlendorf <[email protected]> Closes #3450 Closes #3402
Hello I have a Centos 6.5 running Kernel version 3.14.4 that is running ZFS version 0.6.4.1 running a 2 client NFS based filebench workload and after about 18-19 hours of running am able to fairly consistently produce a hang which is causing the clients to get disconnected. Below are the stack traces.
May 28 13:18:44 ip-50-0-0-28 kernel: INFO: task spl_system_task:1425 blocked for more than 120 seconds.
May 28 13:18:44 ip-50-0-0-28 kernel: Tainted: P O 3.14.4 #1
May 28 13:18:44 ip-50-0-0-28 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 28 13:18:44 ip-50-0-0-28 kernel: spl_system_task D ffff88077c014440 0 1425 2 0x00000000
May 28 13:18:44 ip-50-0-0-28 kernel: ffff880750eef558 0000000000000246 ffff880750e78e70 ffff880750eee010
May 28 13:18:44 ip-50-0-0-28 kernel: 0000000000014440 0000000000014440 ffff880750e78e70 ffff8805062bef70
May 28 13:18:44 ip-50-0-0-28 kernel: ffff880750eef568 ffff8804aadd1e38 ffff8804aadd1e00 ffff880750eef588
May 28 13:18:44 ip-50-0-0-28 kernel: Call Trace:
May 28 13:18:44 ip-50-0-0-28 kernel: [] schedule+0x29/0x70
May 28 13:18:44 ip-50-0-0-28 kernel: [] cv_wait_common+0xfd/0x130 [spl]
May 28 13:18:44 ip-50-0-0-28 kernel: [] ? bit_waitqueue+0xe0/0xe0
May 28 13:18:44 ip-50-0-0-28 kernel: [] __cv_wait+0x15/0x20 [spl]
May 28 13:18:44 ip-50-0-0-28 kernel: [] traverse_prefetcher+0xf2/0x1c0 [zfs]
May 28 13:18:44 ip-50-0-0-28 kernel: [] traverse_visitbp+0x4b6/0x820 [zfs]
May 28 13:18:44 ip-50-0-0-28 kernel: [] ? __kmalloc_node+0x3c/0x50
May 28 13:18:44 ip-50-0-0-28 kernel: [] ? l2arc_add_vdev+0x160/0x160 [zfs]
May 28 13:18:44 ip-50-0-0-28 kernel: [] traverse_visitbp+0x42b/0x820 [zfs]
May 28 13:18:44 ip-50-0-0-28 kernel: [] traverse_dnode+0x71/0xd0 [zfs]
May 28 13:18:44 ip-50-0-0-28 kernel: [] traverse_visitbp+0x697/0x820 [zfs]
May 28 13:18:44 ip-50-0-0-28 kernel: [] traverse_visitbp+0x42b/0x820 [zfs]
May 28 13:18:44 ip-50-0-0-28 kernel: [] traverse_visitbp+0x42b/0x820 [zfs]
May 28 13:18:44 ip-50-0-0-28 kernel: [] traverse_visitbp+0x42b/0x820 [zfs]
May 28 13:18:44 ip-50-0-0-28 kernel: [] traverse_visitbp+0x42b/0x820 [zfs]
May 28 13:18:44 ip-50-0-0-28 kernel: [] traverse_visitbp+0x42b/0x820 [zfs]
May 28 13:18:44 ip-50-0-0-28 kernel: [] traverse_visitbp+0x42b/0x820 [zfs]
May 28 13:18:44 ip-50-0-0-28 kernel: [] traverse_dnode+0x71/0xd0 [zfs]
May 28 13:18:44 ip-50-0-0-28 kernel: [] traverse_visitbp+0x752/0x820 [zfs]
May 28 13:18:44 ip-50-0-0-28 kernel: [] ? xen_load_gs_index+0x17/0x30
May 28 13:18:44 ip-50-0-0-28 kernel: [] ? __switch_to+0x4a9/0x5e0
May 28 13:18:44 ip-50-0-0-28 kernel: [] traverse_prefetch_thread+0x8b/0x100 [zfs]
May 28 13:18:44 ip-50-0-0-28 kernel: [] ? traverse_prefetch_thread+0x100/0x100 [zfs]
May 28 13:18:44 ip-50-0-0-28 kernel: [] ? _raw_spin_unlock_irqrestore+0x34/0xb0
May 28 13:18:44 ip-50-0-0-28 kernel: [] taskq_thread+0x1f2/0x400 [spl]
May 28 13:18:44 ip-50-0-0-28 kernel: [] ? try_to_wake_up+0x230/0x230
May 28 13:18:44 ip-50-0-0-28 kernel: [] ? task_expire+0x120/0x120 [spl]
May 28 13:18:44 ip-50-0-0-28 kernel: [] kthread+0xce/0xf0
May 28 13:18:44 ip-50-0-0-28 kernel: [] ? xen_end_context_switch+0x1e/0x30
May 28 13:18:44 ip-50-0-0-28 kernel: [] ? kthread_freezable_should_stop+0x70/0x70
May 28 13:18:44 ip-50-0-0-28 kernel: [] ret_from_fork+0x7c/0xb0
May 28 13:18:44 ip-50-0-0-28 kernel: [] ? kthread_freezable_should_stop+0x70/0x70
May 28 13:22:44 ip-50-0-0-28 kernel: INFO: task spl_system_task:1425 blocked for more than 120 seconds.
May 28 13:22:44 ip-50-0-0-28 kernel: Tainted: P O 3.14.4 #1
May 28 13:22:44 ip-50-0-0-28 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 28 13:22:44 ip-50-0-0-28 kernel: spl_system_task D ffff88077c014440 0 1425 2 0x00000000
May 28 13:22:44 ip-50-0-0-28 kernel: ffff880750eef5f8 0000000000000246 ffff880750eef608 ffff880750eee010
May 28 13:22:44 ip-50-0-0-28 kernel: 0000000000014440 0000000000014440 ffff880750e78e70 ffff8802b8e7a1d0
May 28 13:22:44 ip-50-0-0-28 kernel: ffff880750eef608 ffff8804aadd1e38 ffff8804aadd1e00 ffff880750eef628
May 28 13:22:44 ip-50-0-0-28 kernel: Call Trace:
May 28 13:22:44 ip-50-0-0-28 kernel: [] schedule+0x29/0x70
May 28 13:22:44 ip-50-0-0-28 kernel: [] cv_wait_common+0xfd/0x130 [spl]
May 28 13:22:44 ip-50-0-0-28 kernel: [] ? bit_waitqueue+0xe0/0xe0
May 28 13:22:44 ip-50-0-0-28 kernel: [] __cv_wait+0x15/0x20 [spl]
May 28 13:22:44 ip-50-0-0-28 kernel: [] traverse_prefetcher+0xf2/0x1c0 [zfs]
May 28 13:22:44 ip-50-0-0-28 kernel: [] ? arc_buf_remove_ref+0x123/0x170 [zfs]
May 28 13:22:44 ip-50-0-0-28 kernel: [] traverse_visitbp+0x4b6/0x820 [zfs]
May 28 13:22:44 ip-50-0-0-28 kernel: [] traverse_dnode+0x71/0xd0 [zfs]
May 28 13:22:44 ip-50-0-0-28 kernel: [] traverse_visitbp+0x697/0x820 [zfs]
May 28 13:22:44 ip-50-0-0-28 kernel: [] traverse_visitbp+0x42b/0x820 [zfs]
May 28 13:22:44 ip-50-0-0-28 kernel: [] traverse_visitbp+0x42b/0x820 [zfs]
May 28 13:22:44 ip-50-0-0-28 kernel: [] traverse_visitbp+0x42b/0x820 [zfs]
May 28 13:22:44 ip-50-0-0-28 kernel: [] traverse_visitbp+0x42b/0x820 [zfs]
May 28 13:22:44 ip-50-0-0-28 kernel: [] traverse_visitbp+0x42b/0x820 [zfs]
May 28 13:22:44 ip-50-0-0-28 kernel: [] traverse_visitbp+0x42b/0x820 [zfs]
May 28 13:22:44 ip-50-0-0-28 kernel: [] traverse_dnode+0x71/0xd0 [zfs]
May 28 13:22:44 ip-50-0-0-28 kernel: [] traverse_visitbp+0x752/0x820 [zfs]
May 28 13:22:44 ip-50-0-0-28 kernel: [] ? xen_load_gs_index+0x17/0x30
May 28 13:22:44 ip-50-0-0-28 kernel: [] ? __switch_to+0x4a9/0x5e0
May 28 13:22:44 ip-50-0-0-28 kernel: [] traverse_prefetch_thread+0x8b/0x100 [zfs]
May 28 13:22:44 ip-50-0-0-28 kernel: [] ? traverse_prefetch_thread+0x100/0x100 [zfs]
May 28 13:22:44 ip-50-0-0-28 kernel: [] ? _raw_spin_unlock_irqrestore+0x34/0xb0
May 28 13:22:44 ip-50-0-0-28 kernel: [] taskq_thread+0x1f2/0x400 [spl]
May 28 13:22:44 ip-50-0-0-28 kernel: [] ? try_to_wake_up+0x230/0x230
May 28 13:22:44 ip-50-0-0-28 kernel: [] ? task_expire+0x120/0x120 [spl]
May 28 13:22:44 ip-50-0-0-28 kernel: [] kthread+0xce/0xf0
May 28 13:22:44 ip-50-0-0-28 kernel: [] ? xen_end_context_switch+0x1e/0x30
May 28 13:22:44 ip-50-0-0-28 kernel: [] ? kthread_freezable_should_stop+0x70/0x70
May 28 13:22:44 ip-50-0-0-28 kernel: [] ret_from_fork+0x7c/0xb0
May 28 13:22:44 ip-50-0-0-28 kernel: [] ? kthread_freezable_should_stop+0x70/0x70
May 28 13:24:44 ip-50-0-0-28 kernel: INFO: task spl_system_task:1425 blocked for more than 120 seconds.
May 28 13:24:44 ip-50-0-0-28 kernel: Tainted: P O 3.14.4 #1
May 28 13:24:44 ip-50-0-0-28 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 28 13:24:44 ip-50-0-0-28 kernel: spl_system_task D ffff88077c014440 0 1425 2 0x00000000
May 28 13:24:44 ip-50-0-0-28 kernel: ffff880750eef5f8 0000000000000246 ffff880750eef608 ffff880750eee010
May 28 13:24:44 ip-50-0-0-28 kernel: 0000000000014440 0000000000014440 ffff880750e78e70 ffff8802b8e7a1d0
May 28 13:24:44 ip-50-0-0-28 kernel: ffff880750eef608 ffff8804aadd1e38 ffff8804aadd1e00 ffff880750eef628
May 28 13:24:44 ip-50-0-0-28 kernel: Call Trace:
May 28 13:24:44 ip-50-0-0-28 kernel: [] schedule+0x29/0x70
May 28 13:24:44 ip-50-0-0-28 kernel: [] cv_wait_common+0xfd/0x130 [spl]
May 28 13:24:44 ip-50-0-0-28 kernel: [] ? bit_waitqueue+0xe0/0xe0
May 28 13:24:44 ip-50-0-0-28 kernel: [] __cv_wait+0x15/0x20 [spl]
May 28 13:24:44 ip-50-0-0-28 kernel: [] traverse_prefetcher+0xf2/0x1c0 [zfs]
May 28 13:24:44 ip-50-0-0-28 kernel: [] ? arc_buf_remove_ref+0x123/0x170 [zfs]
May 28 13:24:44 ip-50-0-0-28 kernel: [] traverse_visitbp+0x4b6/0x820 [zfs]
May 28 13:24:44 ip-50-0-0-28 kernel: [] traverse_dnode+0x71/0xd0 [zfs]
May 28 13:24:44 ip-50-0-0-28 kernel: [] traverse_visitbp+0x697/0x820 [zfs]
May 28 13:24:44 ip-50-0-0-28 kernel: [] traverse_visitbp+0x42b/0x820 [zfs]
May 28 13:24:44 ip-50-0-0-28 kernel: [] traverse_visitbp+0x42b/0x820 [zfs]
May 28 13:24:44 ip-50-0-0-28 kernel: [] traverse_visitbp+0x42b/0x820 [zfs]
May 28 13:24:44 ip-50-0-0-28 kernel: [] traverse_visitbp+0x42b/0x820 [zfs]
May 28 13:24:44 ip-50-0-0-28 kernel: [] traverse_visitbp+0x42b/0x820 [zfs]
May 28 13:24:44 ip-50-0-0-28 kernel: [] traverse_visitbp+0x42b/0x820 [zfs]
May 28 13:24:44 ip-50-0-0-28 kernel: [] traverse_dnode+0x71/0xd0 [zfs]
May 28 13:24:44 ip-50-0-0-28 kernel: [] traverse_visitbp+0x752/0x820 [zfs]
May 28 13:24:44 ip-50-0-0-28 kernel: [] ? xen_load_gs_index+0x17/0x30
May 28 13:24:44 ip-50-0-0-28 kernel: [] ? __switch_to+0x4a9/0x5e0
May 28 13:24:44 ip-50-0-0-28 kernel: [] traverse_prefetch_thread+0x8b/0x100 [zfs]
May 28 13:24:44 ip-50-0-0-28 kernel: [] ? traverse_prefetch_thread+0x100/0x100 [zfs]
May 28 13:24:44 ip-50-0-0-28 kernel: [] ? _raw_spin_unlock_irqrestore+0x34/0xb0
May 28 13:24:44 ip-50-0-0-28 kernel: [] taskq_thread+0x1f2/0x400 [spl]
May 28 13:24:44 ip-50-0-0-28 kernel: [] ? try_to_wake_up+0x230/0x230
May 28 13:24:44 ip-50-0-0-28 kernel: [] ? task_expire+0x120/0x120 [spl]
May 28 13:24:44 ip-50-0-0-28 kernel: [] kthread+0xce/0xf0
May 28 13:24:44 ip-50-0-0-28 kernel: [] ? xen_end_context_switch+0x1e/0x30
May 28 13:24:44 ip-50-0-0-28 kernel: [] ? kthread_freezable_should_stop+0x70/0x70
May 28 13:24:44 ip-50-0-0-28 kernel: [] ret_from_fork+0x7c/0xb0
May 28 13:24:44 ip-50-0-0-28 kernel: [] ? kthread_freezable_should_stop+0x70/0x70
May 28 13:26:44 ip-50-0-0-28 kernel: INFO: task spl_system_task:1425 blocked for more than 120 seconds.
May 28 13:26:44 ip-50-0-0-28 kernel: Tainted: P O 3.14.4 #1
May 28 13:26:44 ip-50-0-0-28 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 28 13:26:44 ip-50-0-0-28 kernel: spl_system_task D ffff88077c014440 0 1425 2 0x00000000
May 28 13:26:44 ip-50-0-0-28 kernel: ffff880750eef5f8 0000000000000246 ffff880750eef608 ffff880750eee010
May 28 13:26:44 ip-50-0-0-28 kernel: 0000000000014440 0000000000014440 ffff880750e78e70 ffff8802b8e7a1d0
May 28 13:26:44 ip-50-0-0-28 kernel: ffff880750eef608 ffff8804aadd1e38 ffff8804aadd1e00 ffff880750eef628
May 28 13:26:44 ip-50-0-0-28 kernel: Call Trace:
May 28 13:26:44 ip-50-0-0-28 kernel: [] schedule+0x29/0x70
May 28 13:26:44 ip-50-0-0-28 kernel: [] cv_wait_common+0xfd/0x130 [spl]
May 28 13:26:44 ip-50-0-0-28 kernel: [] ? bit_waitqueue+0xe0/0xe0
May 28 13:26:44 ip-50-0-0-28 kernel: [] __cv_wait+0x15/0x20 [spl]
May 28 13:26:44 ip-50-0-0-28 kernel: [] traverse_prefetcher+0xf2/0x1c0 [zfs]
May 28 13:26:44 ip-50-0-0-28 kernel: [] ? arc_buf_remove_ref+0x123/0x170 [zfs]
May 28 13:26:44 ip-50-0-0-28 kernel: [] traverse_visitbp+0x4b6/0x820 [zfs]
May 28 13:26:44 ip-50-0-0-28 kernel: [] traverse_dnode+0x71/0xd0 [zfs]
May 28 13:26:44 ip-50-0-0-28 kernel: [] traverse_visitbp+0x697/0x820 [zfs]
May 28 13:26:44 ip-50-0-0-28 kernel: [] traverse_visitbp+0x42b/0x820 [zfs]
May 28 13:26:44 ip-50-0-0-28 kernel: [] traverse_visitbp+0x42b/0x820 [zfs]
May 28 13:26:44 ip-50-0-0-28 kernel: [] traverse_visitbp+0x42b/0x820 [zfs]
May 28 13:26:44 ip-50-0-0-28 kernel: [] traverse_visitbp+0x42b/0x820 [zfs]
May 28 13:26:44 ip-50-0-0-28 kernel: [] traverse_visitbp+0x42b/0x820 [zfs]
May 28 13:26:44 ip-50-0-0-28 kernel: [] traverse_visitbp+0x42b/0x820 [zfs]
May 28 13:26:44 ip-50-0-0-28 kernel: [] traverse_dnode+0x71/0xd0 [zfs]
May 28 13:26:44 ip-50-0-0-28 kernel: [] traverse_visitbp+0x752/0x820 [zfs]
May 28 13:26:44 ip-50-0-0-28 kernel: [] ? xen_load_gs_index+0x17/0x30
May 28 13:26:44 ip-50-0-0-28 kernel: [] ? __switch_to+0x4a9/0x5e0
May 28 13:26:44 ip-50-0-0-28 kernel: [] traverse_prefetch_thread+0x8b/0x100 [zfs]
May 28 13:26:44 ip-50-0-0-28 kernel: [] ? traverse_prefetch_thread+0x100/0x100 [zfs]
May 28 13:26:44 ip-50-0-0-28 kernel: [] ? _raw_spin_unlock_irqrestore+0x34/0xb0
May 28 13:26:44 ip-50-0-0-28 kernel: [] taskq_thread+0x1f2/0x400 [spl]
May 28 13:26:44 ip-50-0-0-28 kernel: [] ? try_to_wake_up+0x230/0x230
May 28 13:26:44 ip-50-0-0-28 kernel: [] ? task_expire+0x120/0x120 [spl]
May 28 13:26:44 ip-50-0-0-28 kernel: [] kthread+0xce/0xf0
May 28 13:26:44 ip-50-0-0-28 kernel: [] ? xen_end_context_switch+0x1e/0x30
May 28 13:26:44 ip-50-0-0-28 kernel: [] ? kthread_freezable_should_stop+0x70/0x70
May 28 13:26:44 ip-50-0-0-28 kernel: [] ret_from_fork+0x7c/0xb0
May 28 13:26:44 ip-50-0-0-28 kernel: [] ? kthread_freezable_should_stop+0x70/0x70
May 28 13:28:44 ip-50-0-0-28 kernel: INFO: task spl_system_task:1425 blocked for more than 120 seconds.
May 28 13:28:44 ip-50-0-0-28 kernel: Tainted: P O 3.14.4 #1
May 28 13:28:44 ip-50-0-0-28 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 28 13:28:44 ip-50-0-0-28 kernel: spl_system_task D ffff88077c014440 0 1425 2 0x00000000
May 28 13:28:44 ip-50-0-0-28 kernel: ffff880750eef5f8 0000000000000246 ffff880750eef608 ffff880750eee010
May 28 13:28:44 ip-50-0-0-28 kernel: 0000000000014440 0000000000014440 ffff880750e78e70 ffff8802b8e7a1d0
May 28 13:28:44 ip-50-0-0-28 kernel: ffff880750eef608 ffff8804aadd1e38 ffff8804aadd1e00 ffff880750eef628
May 28 13:28:44 ip-50-0-0-28 kernel: Call Trace:
May 28 13:28:44 ip-50-0-0-28 kernel: [] schedule+0x29/0x70
May 28 13:28:44 ip-50-0-0-28 kernel: [] cv_wait_common+0xfd/0x130 [spl]
May 28 13:28:44 ip-50-0-0-28 kernel: [] ? bit_waitqueue+0xe0/0xe0
May 28 13:28:44 ip-50-0-0-28 kernel: [] __cv_wait+0x15/0x20 [spl]
May 28 13:28:44 ip-50-0-0-28 kernel: [] traverse_prefetcher+0xf2/0x1c0 [zfs]
May 28 13:28:44 ip-50-0-0-28 kernel: [] ? arc_buf_remove_ref+0x123/0x170 [zfs]
May 28 13:28:44 ip-50-0-0-28 kernel: [] traverse_visitbp+0x4b6/0x820 [zfs]
May 28 13:28:44 ip-50-0-0-28 kernel: [] traverse_dnode+0x71/0xd0 [zfs]
May 28 13:28:44 ip-50-0-0-28 kernel: [] traverse_visitbp+0x697/0x820 [zfs]
May 28 13:28:44 ip-50-0-0-28 kernel: [] traverse_visitbp+0x42b/0x820 [zfs]
May 28 13:28:44 ip-50-0-0-28 kernel: [] traverse_visitbp+0x42b/0x820 [zfs]
May 28 13:28:44 ip-50-0-0-28 kernel: [] traverse_visitbp+0x42b/0x820 [zfs]
May 28 13:28:44 ip-50-0-0-28 kernel: [] traverse_visitbp+0x42b/0x820 [zfs]
May 28 13:28:44 ip-50-0-0-28 kernel: [] traverse_visitbp+0x42b/0x820 [zfs]
May 28 13:28:44 ip-50-0-0-28 kernel: [] traverse_visitbp+0x42b/0x820 [zfs]
May 28 13:28:44 ip-50-0-0-28 kernel: [] traverse_dnode+0x71/0xd0 [zfs]
May 28 13:28:44 ip-50-0-0-28 kernel: [] traverse_visitbp+0x752/0x820 [zfs]
May 28 13:28:44 ip-50-0-0-28 kernel: [] ? xen_load_gs_index+0x17/0x30
May 28 13:28:44 ip-50-0-0-28 kernel: [] ? __switch_to+0x4a9/0x5e0
May 28 13:28:44 ip-50-0-0-28 kernel: [] traverse_prefetch_thread+0x8b/0x100 [zfs]
May 28 13:28:44 ip-50-0-0-28 kernel: [] ? traverse_prefetch_thread+0x100/0x100 [zfs]
May 28 13:28:44 ip-50-0-0-28 kernel: [] ? _raw_spin_unlock_irqrestore+0x34/0xb0
May 28 13:28:44 ip-50-0-0-28 kernel: [] taskq_thread+0x1f2/0x400 [spl]
May 28 13:28:44 ip-50-0-0-28 kernel: [] ? try_to_wake_up+0x230/0x230
May 28 13:28:44 ip-50-0-0-28 kernel: [] ? task_expire+0x120/0x120 [spl]
May 28 13:28:44 ip-50-0-0-28 kernel: [] kthread+0xce/0xf0
May 28 13:28:44 ip-50-0-0-28 kernel: [] ? xen_end_context_switch+0x1e/0x30
May 28 13:28:44 ip-50-0-0-28 kernel: [] ? kthread_freezable_should_stop+0x70/0x70
May 28 13:28:44 ip-50-0-0-28 kernel: [] ret_from_fork+0x7c/0xb0
May 28 13:28:44 ip-50-0-0-28 kernel: [] ? kthread_freezable_should_stop+0x70/0x70
May 28 13:30:44 ip-50-0-0-28 kernel: INFO: task spl_system_task:1425 blocked for more than 120 seconds.
May 28 13:30:44 ip-50-0-0-28 kernel: Tainted: P O 3.14.4 #1
May 28 13:30:44 ip-50-0-0-28 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 28 13:30:44 ip-50-0-0-28 kernel: spl_system_task D ffff88077c014440 0 1425 2 0x00000000
May 28 13:30:44 ip-50-0-0-28 kernel: ffff880750eef5f8 0000000000000246 ffff880750eef608 ffff880750eee010
May 28 13:30:44 ip-50-0-0-28 kernel: 0000000000014440 0000000000014440 ffff880750e78e70 ffff8802b8e7a1d0
May 28 13:30:44 ip-50-0-0-28 kernel: ffff880750eef608 ffff8804aadd1e38 ffff8804aadd1e00 ffff880750eef628
May 28 13:30:44 ip-50-0-0-28 kernel: Call Trace:
May 28 13:30:44 ip-50-0-0-28 kernel: [] schedule+0x29/0x70
May 28 13:30:44 ip-50-0-0-28 kernel: [] cv_wait_common+0xfd/0x130 [spl]
May 28 13:30:44 ip-50-0-0-28 kernel: [] ? bit_waitqueue+0xe0/0xe0
May 28 13:30:44 ip-50-0-0-28 kernel: [] __cv_wait+0x15/0x20 [spl]
May 28 13:30:44 ip-50-0-0-28 kernel: [] traverse_prefetcher+0xf2/0x1c0 [zfs]
May 28 13:30:44 ip-50-0-0-28 kernel: [] ? arc_buf_remove_ref+0x123/0x170 [zfs]
May 28 13:30:44 ip-50-0-0-28 kernel: [] traverse_visitbp+0x4b6/0x820 [zfs]
May 28 13:30:44 ip-50-0-0-28 kernel: [] traverse_dnode+0x71/0xd0 [zfs]
May 28 13:30:44 ip-50-0-0-28 kernel: [] traverse_visitbp+0x697/0x820 [zfs]
May 28 13:30:44 ip-50-0-0-28 kernel: [] traverse_visitbp+0x42b/0x820 [zfs]
May 28 13:30:44 ip-50-0-0-28 kernel: [] traverse_visitbp+0x42b/0x820 [zfs]
May 28 13:30:44 ip-50-0-0-28 kernel: [] traverse_visitbp+0x42b/0x820 [zfs]
May 28 13:30:44 ip-50-0-0-28 kernel: [] traverse_visitbp+0x42b/0x820 [zfs]
May 28 13:30:44 ip-50-0-0-28 kernel: [] traverse_visitbp+0x42b/0x820 [zfs]
May 28 13:30:44 ip-50-0-0-28 kernel: [] traverse_visitbp+0x42b/0x820 [zfs]
May 28 13:30:44 ip-50-0-0-28 kernel: [] traverse_dnode+0x71/0xd0 [zfs]
May 28 13:30:44 ip-50-0-0-28 kernel: [] traverse_visitbp+0x752/0x820 [zfs]
May 28 13:30:44 ip-50-0-0-28 kernel: [] ? xen_load_gs_index+0x17/0x30
May 28 13:30:44 ip-50-0-0-28 kernel: [] ? __switch_to+0x4a9/0x5e0
May 28 13:30:44 ip-50-0-0-28 kernel: [] traverse_prefetch_thread+0x8b/0x100 [zfs]
May 28 13:30:44 ip-50-0-0-28 kernel: [] ? traverse_prefetch_thread+0x100/0x100 [zfs]
May 28 13:30:44 ip-50-0-0-28 kernel: [] ? _raw_spin_unlock_irqrestore+0x34/0xb0
May 28 13:30:44 ip-50-0-0-28 kernel: [] taskq_thread+0x1f2/0x400 [spl]
May 28 13:30:44 ip-50-0-0-28 kernel: [] ? try_to_wake_up+0x230/0x230
May 28 13:30:44 ip-50-0-0-28 kernel: [] ? task_expire+0x120/0x120 [spl]
May 28 13:30:44 ip-50-0-0-28 kernel: [] kthread+0xce/0xf0
May 28 13:30:44 ip-50-0-0-28 kernel: [] ? xen_end_context_switch+0x1e/0x30
May 28 13:30:44 ip-50-0-0-28 kernel: [] ? kthread_freezable_should_stop+0x70/0x70
May 28 13:30:44 ip-50-0-0-28 kernel: [] ret_from_fork+0x7c/0xb0
May 28 13:30:44 ip-50-0-0-28 kernel: [] ? kthread_freezable_should_stop+0x70/0x70
The out put of zpool events -v
May 27 2015 13:59:02.849404350 resource.fs.zfs.statechange
version = 0x0
class = "resource.fs.zfs.statechange"
pool_guid = 0x964d09a4063653d9
pool_context = 0x2
vdev_guid = 0x8a564211ec3af525
vdev_state = 0x7
time = 0x556605e6 0x32a0e1be
eid = 0x1
May 27 2015 13:59:02.867404025 resource.fs.zfs.statechange
version = 0x0
class = "resource.fs.zfs.statechange"
pool_guid = 0x964d09a4063653d9
pool_context = 0x2
vdev_guid = 0x197d0636386e116d
vdev_state = 0x7
time = 0x556605e6 0x33b388f9
eid = 0x2
May 27 2015 13:59:02.891403591 resource.fs.zfs.statechange
version = 0x0
class = "resource.fs.zfs.statechange"
pool_guid = 0x964d09a4063653d9
pool_context = 0x2
vdev_guid = 0x282070ca942e2dcf
vdev_state = 0x7
time = 0x556605e6 0x3521bd47
eid = 0x3
May 27 2015 13:59:02.953402471 resource.fs.zfs.statechange
version = 0x0
class = "resource.fs.zfs.statechange"
pool_guid = 0x964d09a4063653d9
pool_context = 0x0
vdev_guid = 0x197d0636386e116d
vdev_state = 0x7
time = 0x556605e6 0x38d3c467
eid = 0x4
May 27 2015 13:59:02.981401965 resource.fs.zfs.statechange
version = 0x0
class = "resource.fs.zfs.statechange"
pool_guid = 0x964d09a4063653d9
pool_context = 0x0
vdev_guid = 0x282070ca942e2dcf
vdev_state = 0x7
time = 0x556605e6 0x3a7f016d
eid = 0x5
May 27 2015 13:59:03.020401261 resource.fs.zfs.statechange
version = 0x0
class = "resource.fs.zfs.statechange"
pool_guid = 0x964d09a4063653d9
pool_context = 0x0
vdev_guid = 0x197d0636386e116d
vdev_state = 0x7
time = 0x556605e7 0x1374c6d
eid = 0x6
May 27 2015 13:59:03.031401062 resource.fs.zfs.statechange
version = 0x0
class = "resource.fs.zfs.statechange"
pool_guid = 0x964d09a4063653d9
pool_context = 0x0
vdev_guid = 0x282070ca942e2dcf
vdev_state = 0x7
time = 0x556605e7 0x1df2466
eid = 0x7
May 27 2015 13:59:03.042400863 resource.fs.zfs.statechange
version = 0x0
class = "resource.fs.zfs.statechange"
pool_guid = 0x964d09a4063653d9
pool_context = 0x0
vdev_guid = 0x197d0636386e116d
vdev_state = 0x7
time = 0x556605e7 0x286fc5f
eid = 0x8
May 27 2015 13:59:03.047400773 resource.fs.zfs.statechange
version = 0x0
class = "resource.fs.zfs.statechange"
pool_guid = 0x964d09a4063653d9
pool_context = 0x0
vdev_guid = 0x282070ca942e2dcf
vdev_state = 0x7
time = 0x556605e7 0x2d34745
eid = 0x9
May 27 2015 13:59:03.133399219 ereport.fs.zfs.config.sync
class = "ereport.fs.zfs.config.sync"
ena = 0x4fb16653d00401
detector = (embedded nvlist)
version = 0x0
scheme = "zfs"
pool = 0x964d09a4063653d9
(end detector)
pool = "poolrc1"
pool_guid = 0x964d09a4063653d9
pool_context = 0x0
pool_failmode = "wait"
time = 0x556605e7 0x7f382b3
eid = 0xa
May 27 2015 13:59:03.223397593 ereport.fs.zfs.config.sync
class = "ereport.fs.zfs.config.sync"
ena = 0x50071617a00401
detector = (embedded nvlist)
version = 0x0
scheme = "zfs"
pool = 0x964d09a4063653d9
(end detector)
pool = "poolrc1"
pool_guid = 0x964d09a4063653d9
pool_context = 0x0
pool_failmode = "wait"
time = 0x556605e7 0xd50c6d9
eid = 0xb
It appears to me that this is the CV wait I am getting stuck on
https://github.com/zfsonlinux/zfs/blob/7fec46b9d8967109ad289d208e8cf36a0c16e40c/module/zfs/dmu_traverse.c#L253
any thoughts or insights would be greatly appreciated
The text was updated successfully, but these errors were encountered: