Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

thread hang #3033

Closed
ldonzis opened this issue Jan 23, 2015 · 7 comments
Closed

thread hang #3033

ldonzis opened this issue Jan 23, 2015 · 7 comments

Comments

@ldonzis
Copy link

ldonzis commented Jan 23, 2015

This looks similar to some other reports, but they don't appear to be exactly the same.

I'm not even sure of the cause... at the moment, one pool is resilvering, and there are two send/recv operations going on (this is the sending side).

This is on Ubuntu 12.04 w/ kernel 3.2.0-75, if that helps.

Thanks,
lew

[ 240.448023] INFO: task spl_system_task:332 blocked for more than 120 seconds.
[ 240.448450] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 240.448897] spl_system_task D ffffffff81806200 0 332 2 0x00000000
[ 240.448901] ffff880413379710 0000000000000046 ffff8804133796f0 ffffffffa019146e
[ 240.448906] ffff880413379fd8 ffff880413379fd8 ffff880413379fd8 00000000000127c0
[ 240.448909] ffff8804155b9700 ffff880410e9ae00 ffff880413379720 ffff8803f55d4620
[ 240.448913] Call Trace:
[ 240.448958] [] ? buf_hash_find+0x7e/0x100 [zfs]
[ 240.448964] [] schedule+0x3f/0x60
[ 240.448978] [] cv_wait_common+0xfd/0x1b0 [spl]
[ 240.448983] [] ? add_wait_queue+0x60/0x60
[ 240.448990] [] __cv_wait+0x15/0x20 [spl]
[ 240.449010] [] traverse_prefetcher+0xab/0x140 [zfs]
[ 240.449030] [] traverse_visitbp+0x2b6/0x6d0 [zfs]
[ 240.449049] [] traverse_visitbp+0x430/0x6d0 [zfs]
[ 240.449069] [] traverse_visitbp+0x430/0x6d0 [zfs]
[ 240.449088] [] traverse_visitbp+0x430/0x6d0 [zfs]
[ 240.449108] [] traverse_visitbp+0x430/0x6d0 [zfs]
[ 240.449127] [] traverse_visitbp+0x430/0x6d0 [zfs]
[ 240.449146] [] traverse_visitbp+0x430/0x6d0 [zfs]
[ 240.449166] [] traverse_dnode+0x6c/0xf0 [zfs]
[ 240.449185] [] traverse_visitbp+0x5ef/0x6d0 [zfs]
[ 240.449189] [] ? dequeue_task_fair+0xb7/0x100
[ 240.449194] [] ? __switch_to+0xf5/0x360
[ 240.449213] [] traverse_prefetch_thread+0x83/0xc0 [zfs]
[ 240.449232] [] ? prefetch_dnode_metadata+0xb0/0xb0 [zfs]
[ 240.449239] [] taskq_thread+0x236/0x4b0 [spl]
[ 240.449243] [] ? try_to_wake_up+0x200/0x200
[ 240.449249] [] ? task_done+0x160/0x160 [spl]
[ 240.449252] [] kthread+0x8c/0xa0
[ 240.449257] [] kernel_thread_helper+0x4/0x10
[ 240.449260] [] ? flush_kthread_worker+0xa0/0xa0
[ 240.449263] [] ? gs_change+0x13/0x13
[ 240.449265] INFO: task spl_system_task:333 blocked for more than 120 seconds.
[ 240.449673] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 240.450119] spl_system_task D ffffffff81806200 0 333 2 0x00000000
[ 240.450122] ffff88041337b710 0000000000000046 ffff88041337b6f0 ffffffffa019146e
[ 240.450126] ffff88041337bfd8 ffff88041337bfd8 ffff88041337bfd8 00000000000127c0
[ 240.450130] ffff880415581700 ffff880410e98000 ffff88041337b720 ffff8803e3d291a0
[ 240.450133] Call Trace:
[ 240.450147] [] ? buf_hash_find+0x7e/0x100 [zfs]
[ 240.450150] [] schedule+0x3f/0x60
[ 240.450157] [] cv_wait_common+0xfd/0x1b0 [spl]
[ 240.450160] [] ? add_wait_queue+0x60/0x60
[ 240.450167] [] __cv_wait+0x15/0x20 [spl]
[ 240.450186] [] traverse_prefetcher+0xab/0x140 [zfs]
[ 240.450205] [] traverse_visitbp+0x2b6/0x6d0 [zfs]
[ 240.450225] [] traverse_visitbp+0x430/0x6d0 [zfs]
[ 240.450244] [] traverse_visitbp+0x430/0x6d0 [zfs]
[ 240.450264] [] traverse_visitbp+0x430/0x6d0 [zfs]
[ 240.450283] [] traverse_visitbp+0x430/0x6d0 [zfs]
[ 240.450302] [] traverse_visitbp+0x430/0x6d0 [zfs]
[ 240.450322] [] traverse_visitbp+0x430/0x6d0 [zfs]
[ 240.450342] [] traverse_dnode+0x6c/0xf0 [zfs]
[ 240.450361] [] traverse_visitbp+0x5ef/0x6d0 [zfs]
[ 240.450364] [] ? __switch_to+0xf5/0x360
[ 240.450383] [] traverse_prefetch_thread+0x83/0xc0 [zfs]
[ 240.450402] [] ? prefetch_dnode_metadata+0xb0/0xb0 [zfs]
[ 240.450409] [] taskq_thread+0x236/0x4b0 [spl]
[ 240.450412] [] ? try_to_wake_up+0x200/0x200
[ 240.450418] [] ? task_done+0x160/0x160 [spl]
[ 240.450421] [] kthread+0x8c/0xa0
[ 240.450424] [] kernel_thread_helper+0x4/0x10
[ 240.450427] [] ? flush_kthread_worker+0xa0/0xa0
[ 240.450429] [] ? gs_change+0x13/0x13
[ 240.450469] INFO: task zfs:3011 blocked for more than 120 seconds.
[ 240.450831] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 240.451277] zfs D 0000000000000000 0 3011 2938 0x00000000
[ 240.451280] ffff8803e78eb528 0000000000000082 ffff8803e78eb508 ffffffffa00e5409
[ 240.451284] ffff8803e78ebfd8 ffff8803e78ebfd8 ffff8803e78ebfd8 00000000000127c0
[ 240.451288] ffff8803ffe30000 ffff8803f22c1700 0000000000000282 ffff880409e89200
[ 240.451291] Call Trace:
[ 240.451298] [] ? taskq_find+0x169/0x260 [spl]
[ 240.451301] [] schedule+0x3f/0x60
[ 240.451308] [] taskq_wait_id+0x65/0xa0 [spl]
[ 240.451311] [] ? add_wait_queue+0x60/0x60
[ 240.451329] [] ? dmu_recv_begin_sync+0x360/0x360 [zfs]
[ 240.451348] [] ? dmu_recv_begin_sync+0x360/0x360 [zfs]
[ 240.451376] [] spa_taskq_dispatch_sync+0x7a/0xa0 [zfs]
[ 240.451395] [] dump_bytes+0x42/0x50 [zfs]
[ 240.451413] [] dump_free+0x171/0x1c0 [zfs]
[ 240.451432] [] backup_cb+0x76d/0x870 [zfs]
[ 240.451437] [] ? __wake_up+0x53/0x70
[ 240.451456] [] traverse_visitbp+0x2b6/0x6d0 [zfs]
[ 240.451475] [] traverse_visitbp+0x430/0x6d0 [zfs]
[ 240.451495] [] traverse_visitbp+0x430/0x6d0 [zfs]
[ 240.451514] [] traverse_visitbp+0x430/0x6d0 [zfs]
[ 240.451536] [] traverse_visitbp+0x430/0x6d0 [zfs]
[ 240.451556] [] traverse_visitbp+0x430/0x6d0 [zfs]
[ 240.451575] [] traverse_visitbp+0x430/0x6d0 [zfs]
[ 240.451595] [] traverse_dnode+0x6c/0xf0 [zfs]
[ 240.451614] [] traverse_visitbp+0x5ef/0x6d0 [zfs]
[ 240.451621] [] ? taskq_dispatch+0x72/0x360 [spl]
[ 240.451640] [] traverse_impl+0x171/0x360 [zfs]
[ 240.451659] [] ? dmu_recv_begin_sync+0x360/0x360 [zfs]
[ 240.451678] [] traverse_dataset+0x44/0x50 [zfs]
[ 240.451697] [] ? dmu_send_impl+0x550/0x550 [zfs]
[ 240.451716] [] dmu_send_impl+0x3ab/0x550 [zfs]
[ 240.451735] [] dmu_send_obj+0xc5/0x120 [zfs]
[ 240.451763] [] zfs_ioc_send+0xb8/0x260 [zfs]
[ 240.451769] [] ? strdup+0x83/0x90 [spl]
[ 240.451795] [] zfsdev_ioctl+0x491/0x500 [zfs]
[ 240.451800] [] ? kmem_cache_free+0x2f/0x110
[ 240.451804] [] do_vfs_ioctl+0x8a/0x340
[ 240.451808] [] ? do_munmap+0x1f3/0x2f0
[ 240.451811] [] sys_ioctl+0x91/0xa0
[ 240.451814] [] ? do_device_not_available+0xe/0x10
[ 240.451818] [] system_call_fastpath+0x16/0x1b
[ 240.451820] INFO: task zfs:3069 blocked for more than 120 seconds.
[ 240.452196] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 240.452642] zfs D ffffffff81806200 0 3069 2937 0x00000000
[ 240.452646] ffff8803e3eb13b8 0000000000000082 ffff8803e3eb1398 ffffffffa00e5409
[ 240.452649] ffff8803e3eb1fd8 ffff8803e3eb1fd8 ffff8803e3eb1fd8 00000000000127c0
[ 240.452653] ffff880415561700 ffff880410554500 0000000000000286 ffff880409399000
[ 240.452656] Call Trace:
[ 240.452664] [] ? taskq_find+0x169/0x260 [spl]
[ 240.452667] [] schedule+0x3f/0x60
[ 240.452673] [] taskq_wait_id+0x65/0xa0 [spl]
[ 240.452676] [] ? add_wait_queue+0x60/0x60
[ 240.452694] [] ? dmu_recv_begin_sync+0x360/0x360 [zfs]
[ 240.452713] [] ? dmu_recv_begin_sync+0x360/0x360 [zfs]
[ 240.452738] [] spa_taskq_dispatch_sync+0x7a/0xa0 [zfs]
[ 240.452759] [] dump_bytes+0x42/0x50 [zfs]
[ 240.452779] [] backup_cb+0x34b/0x870 [zfs]
[ 240.452784] [] ? __wake_up+0x53/0x70
[ 240.452805] [] traverse_visitbp+0x2b6/0x6d0 [zfs]
[ 240.452822] [] ? arc_read+0x8c0/0x8c0 [zfs]
[ 240.452843] [] traverse_visitbp+0x430/0x6d0 [zfs]
[ 240.452864] [] traverse_dnode+0x6c/0xf0 [zfs]
[ 240.452885] [] traverse_visitbp+0x53d/0x6d0 [zfs]
[ 240.452906] [] traverse_visitbp+0x430/0x6d0 [zfs]
[ 240.452927] [] traverse_visitbp+0x430/0x6d0 [zfs]
[ 240.452948] [] traverse_visitbp+0x430/0x6d0 [zfs]
[ 240.452970] [] traverse_visitbp+0x430/0x6d0 [zfs]
[ 240.452991] [] traverse_visitbp+0x430/0x6d0 [zfs]
[ 240.453013] [] traverse_visitbp+0x430/0x6d0 [zfs]
[ 240.453034] [] traverse_dnode+0x6c/0xf0 [zfs]
[ 240.453055] [] traverse_visitbp+0x5ef/0x6d0 [zfs]
[ 240.453064] [] ? taskq_dispatch+0x72/0x360 [spl]
[ 240.453085] [] traverse_impl+0x171/0x360 [zfs]
[ 240.453105] [] ? dmu_recv_begin_sync+0x360/0x360 [zfs]
[ 240.453126] [] traverse_dataset+0x44/0x50 [zfs]
[ 240.453147] [] ? dmu_send_impl+0x550/0x550 [zfs]
[ 240.453168] [] dmu_send_impl+0x3ab/0x550 [zfs]
[ 240.453189] [] dmu_send_obj+0xc5/0x120 [zfs]
[ 240.453216] [] zfs_ioc_send+0xb8/0x260 [zfs]
[ 240.453224] [] ? strdup+0x83/0x90 [spl]
[ 240.453252] [] zfsdev_ioctl+0x491/0x500 [zfs]
[ 240.453257] [] do_vfs_ioctl+0x8a/0x340
[ 240.453261] [] ? sys_ioctl+0x3c/0xa0
[ 240.453265] [] sys_ioctl+0x91/0xa0
[ 240.453270] [] ? do_device_not_available+0xe/0x10
[ 240.453275] [] system_call_fastpath+0x16/0x1b
[ 360.452022] INFO: task spl_system_task:332 blocked for more than 120 seconds.
[ 360.452435] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 360.452882] spl_system_task D ffffffff81806200 0 332 2 0x00000000
[ 360.452887] ffff880413379710 0000000000000046 ffff8804133796f0 ffffffffa019146e
[ 360.452891] ffff880413379fd8 ffff880413379fd8 ffff880413379fd8 00000000000127c0
[ 360.452895] ffff8804155b9700 ffff880410e9ae00 ffff880413379720 ffff8803f55d4620
[ 360.452898] Call Trace:
[ 360.452944] [] ? buf_hash_find+0x7e/0x100 [zfs]
[ 360.452951] [] schedule+0x3f/0x60
[ 360.452964] [] cv_wait_common+0xfd/0x1b0 [spl]
[ 360.452969] [] ? add_wait_queue+0x60/0x60
[ 360.452976] [] __cv_wait+0x15/0x20 [spl]
[ 360.452997] [] traverse_prefetcher+0xab/0x140 [zfs]
[ 360.453016] [] traverse_visitbp+0x2b6/0x6d0 [zfs]
[ 360.453035] [] traverse_visitbp+0x430/0x6d0 [zfs]
[ 360.453055] [] traverse_visitbp+0x430/0x6d0 [zfs]
[ 360.453074] [] traverse_visitbp+0x430/0x6d0 [zfs]
[ 360.453094] [] traverse_visitbp+0x430/0x6d0 [zfs]
[ 360.453113] [] traverse_visitbp+0x430/0x6d0 [zfs]
[ 360.453132] [] traverse_visitbp+0x430/0x6d0 [zfs]
[ 360.453152] [] traverse_dnode+0x6c/0xf0 [zfs]
[ 360.453171] [] traverse_visitbp+0x5ef/0x6d0 [zfs]
[ 360.453175] [] ? dequeue_task_fair+0xb7/0x100
[ 360.453180] [] ? __switch_to+0xf5/0x360
[ 360.453199] [] traverse_prefetch_thread+0x83/0xc0 [zfs]
[ 360.453218] [] ? prefetch_dnode_metadata+0xb0/0xb0 [zfs]
[ 360.453225] [] taskq_thread+0x236/0x4b0 [spl]
[ 360.453229] [] ? try_to_wake_up+0x200/0x200
[ 360.453236] [] ? task_done+0x160/0x160 [spl]
[ 360.453239] [] kthread+0x8c/0xa0
[ 360.453244] [] kernel_thread_helper+0x4/0x10
[ 360.453246] [] ? flush_kthread_worker+0xa0/0xa0
[ 360.453249] [] ? gs_change+0x13/0x13
[ 360.453252] INFO: task spl_system_task:333 blocked for more than 120 seconds.
[ 360.453659] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 360.454106] spl_system_task D ffffffff81806200 0 333 2 0x00000000
[ 360.454109] ffff88041337b710 0000000000000046 ffff88041337b6f0 ffffffffa019146e
[ 360.454112] ffff88041337bfd8 ffff88041337bfd8 ffff88041337bfd8 00000000000127c0
[ 360.454116] ffff880415581700 ffff880410e98000 ffff88041337b720 ffff8803e3d291a0
[ 360.454120] Call Trace:
[ 360.454134] [] ? buf_hash_find+0x7e/0x100 [zfs]
[ 360.454137] [] schedule+0x3f/0x60
[ 360.454144] [] cv_wait_common+0xfd/0x1b0 [spl]
[ 360.454147] [] ? add_wait_queue+0x60/0x60
[ 360.454154] [] __cv_wait+0x15/0x20 [spl]
[ 360.454172] [] traverse_prefetcher+0xab/0x140 [zfs]
[ 360.454192] [] traverse_visitbp+0x2b6/0x6d0 [zfs]
[ 360.454211] [] traverse_visitbp+0x430/0x6d0 [zfs]
[ 360.454231] [] traverse_visitbp+0x430/0x6d0 [zfs]
[ 360.454250] [] traverse_visitbp+0x430/0x6d0 [zfs]
[ 360.454270] [] traverse_visitbp+0x430/0x6d0 [zfs]
[ 360.454289] [] traverse_visitbp+0x430/0x6d0 [zfs]
[ 360.454309] [] traverse_visitbp+0x430/0x6d0 [zfs]
[ 360.454328] [] traverse_dnode+0x6c/0xf0 [zfs]
[ 360.454347] [] traverse_visitbp+0x5ef/0x6d0 [zfs]
[ 360.454350] [] ? __switch_to+0xf5/0x360
[ 360.454369] [] traverse_prefetch_thread+0x83/0xc0 [zfs]
[ 360.454388] [] ? prefetch_dnode_metadata+0xb0/0xb0 [zfs]
[ 360.454395] [] taskq_thread+0x236/0x4b0 [spl]
[ 360.454398] [] ? try_to_wake_up+0x200/0x200
[ 360.454405] [] ? task_done+0x160/0x160 [spl]
[ 360.454407] [] kthread+0x8c/0xa0
[ 360.454411] [] kernel_thread_helper+0x4/0x10
[ 360.454413] [] ? flush_kthread_worker+0xa0/0xa0
[ 360.454416] [] ? gs_change+0x13/0x13
[ 480.452024] INFO: task spl_system_task:332 blocked for more than 120 seconds.
[ 480.452442] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 480.452899] spl_system_task D ffffffff81806200 0 332 2 0x00000000
[ 480.452904] ffff880413379710 0000000000000046 ffff8804133796f0 ffffffffa019146e
[ 480.452908] ffff880413379fd8 ffff880413379fd8 ffff880413379fd8 00000000000127c0
[ 480.452912] ffff8804155b9700 ffff880410e9ae00 ffff880413379720 ffff8803f55d4620
[ 480.452916] Call Trace:
[ 480.452975] [] ? buf_hash_find+0x7e/0x100 [zfs]
[ 480.452982] [] schedule+0x3f/0x60
[ 480.452999] [] cv_wait_common+0xfd/0x1b0 [spl]
[ 480.453005] [] ? add_wait_queue+0x60/0x60
[ 480.453012] [] __cv_wait+0x15/0x20 [spl]
[ 480.453033] [] traverse_prefetcher+0xab/0x140 [zfs]
[ 480.453053] [] traverse_visitbp+0x2b6/0x6d0 [zfs]
[ 480.453072] [] traverse_visitbp+0x430/0x6d0 [zfs]
[ 480.453091] [] traverse_visitbp+0x430/0x6d0 [zfs]
[ 480.453111] [] traverse_visitbp+0x430/0x6d0 [zfs]
[ 480.453130] [] traverse_visitbp+0x430/0x6d0 [zfs]
[ 480.453150] [] traverse_visitbp+0x430/0x6d0 [zfs]
[ 480.453169] [] traverse_visitbp+0x430/0x6d0 [zfs]
[ 480.453188] [] traverse_dnode+0x6c/0xf0 [zfs]
[ 480.453207] [] traverse_visitbp+0x5ef/0x6d0 [zfs]
[ 480.453212] [] ? dequeue_task_fair+0xb7/0x100
[ 480.453217] [] ? __switch_to+0xf5/0x360
[ 480.453236] [] traverse_prefetch_thread+0x83/0xc0 [zfs]
[ 480.453255] [] ? prefetch_dnode_metadata+0xb0/0xb0 [zfs]
[ 480.453262] [] taskq_thread+0x236/0x4b0 [spl]
[ 480.453266] [] ? try_to_wake_up+0x200/0x200
[ 480.453273] [] ? task_done+0x160/0x160 [spl]
[ 480.453275] [] kthread+0x8c/0xa0
[ 480.453280] [] kernel_thread_helper+0x4/0x10
[ 480.453283] [] ? flush_kthread_worker+0xa0/0xa0
[ 480.453286] [] ? gs_change+0x13/0x13
[ 480.453288] INFO: task spl_system_task:333 blocked for more than 120 seconds.
[ 480.453697] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 480.454143] spl_system_task D ffffffff81806200 0 333 2 0x00000000
[ 480.454147] ffff88041337b710 0000000000000046 ffff88041337b6f0 ffffffffa019146e
[ 480.454150] ffff88041337bfd8 ffff88041337bfd8 ffff88041337bfd8 00000000000127c0
[ 480.454154] ffff880415581700 ffff880410e98000 ffff88041337b720 ffff8803e3d291a0
[ 480.454158] Call Trace:
[ 480.454172] [] ? buf_hash_find+0x7e/0x100 [zfs]
[ 480.454175] [] schedule+0x3f/0x60
[ 480.454182] [] cv_wait_common+0xfd/0x1b0 [spl]
[ 480.454185] [] ? add_wait_queue+0x60/0x60
[ 480.454192] [] __cv_wait+0x15/0x20 [spl]
[ 480.454211] [] traverse_prefetcher+0xab/0x140 [zfs]
[ 480.454230] [] traverse_visitbp+0x2b6/0x6d0 [zfs]
[ 480.454249] [] traverse_visitbp+0x430/0x6d0 [zfs]
[ 480.454269] [] traverse_visitbp+0x430/0x6d0 [zfs]
[ 480.454288] [] traverse_visitbp+0x430/0x6d0 [zfs]
[ 480.454308] [] traverse_visitbp+0x430/0x6d0 [zfs]
[ 480.454327] [] traverse_visitbp+0x430/0x6d0 [zfs]
[ 480.454346] [] traverse_visitbp+0x430/0x6d0 [zfs]
[ 480.454366] [] traverse_dnode+0x6c/0xf0 [zfs]
[ 480.454385] [] traverse_visitbp+0x5ef/0x6d0 [zfs]
[ 480.454387] [] ? __switch_to+0xf5/0x360
[ 480.454406] [] traverse_prefetch_thread+0x83/0xc0 [zfs]
[ 480.454426] [] ? prefetch_dnode_metadata+0xb0/0xb0 [zfs]
[ 480.454433] [] taskq_thread+0x236/0x4b0 [spl]
[ 480.454436] [] ? try_to_wake_up+0x200/0x200
[ 480.454442] [] ? task_done+0x160/0x160 [spl]
[ 480.454445] [] kthread+0x8c/0xa0
[ 480.454448] [] kernel_thread_helper+0x4/0x10
[ 480.454451] [] ? flush_kthread_worker+0xa0/0xa0
[ 480.454453] [] ? gs_change+0x13/0x13
[ 600.452058] INFO: task spl_system_task:332 blocked for more than 120 seconds.
[ 600.452480] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 600.452927] spl_system_task D ffffffff81806200 0 332 2 0x00000000
[ 600.452932] ffff880413379710 0000000000000046 ffff8804133796f0 ffffffffa019146e
[ 600.452937] ffff880413379fd8 ffff880413379fd8 ffff880413379fd8 00000000000127c0
[ 600.452941] ffff8804155b9700 ffff880410e9ae00 ffff880413379720 ffff8803f55d4620
[ 600.452945] Call Trace:
[ 600.453042] [] ? buf_hash_find+0x7e/0x100 [zfs]
[ 600.453049] [] schedule+0x3f/0x60
[ 600.453075] [] cv_wait_common+0xfd/0x1b0 [spl]
[ 600.453081] [] ? add_wait_queue+0x60/0x60
[ 600.453088] [] __cv_wait+0x15/0x20 [spl]
[ 600.453110] [] traverse_prefetcher+0xab/0x140 [zfs]
[ 600.453129] [] traverse_visitbp+0x2b6/0x6d0 [zfs]
[ 600.453148] [] traverse_visitbp+0x430/0x6d0 [zfs]
[ 600.453168] [] traverse_visitbp+0x430/0x6d0 [zfs]
[ 600.453187] [] traverse_visitbp+0x430/0x6d0 [zfs]
[ 600.453206] [] traverse_visitbp+0x430/0x6d0 [zfs]
[ 600.453226] [] traverse_visitbp+0x430/0x6d0 [zfs]
[ 600.453245] [] traverse_visitbp+0x430/0x6d0 [zfs]
[ 600.453265] [] traverse_dnode+0x6c/0xf0 [zfs]
[ 600.453284] [] traverse_visitbp+0x5ef/0x6d0 [zfs]
[ 600.453289] [] ? dequeue_task_fair+0xb7/0x100
[ 600.453294] [] ? __switch_to+0xf5/0x360
[ 600.453313] [] traverse_prefetch_thread+0x83/0xc0 [zfs]
[ 600.453332] [] ? prefetch_dnode_metadata+0xb0/0xb0 [zfs]
[ 600.453339] [] taskq_thread+0x236/0x4b0 [spl]
[ 600.453343] [] ? try_to_wake_up+0x200/0x200
[ 600.453350] [] ? task_done+0x160/0x160 [spl]
[ 600.453353] [] kthread+0x8c/0xa0
[ 600.453359] [] kernel_thread_helper+0x4/0x10
[ 600.453362] [] ? flush_kthread_worker+0xa0/0xa0
[ 600.453364] [] ? gs_change+0x13/0x13
[ 600.453367] INFO: task spl_system_task:333 blocked for more than 120 seconds.
[ 600.453777] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 600.454223] spl_system_task D ffffffff81806200 0 333 2 0x00000000
[ 600.454227] ffff88041337b710 0000000000000046 ffff88041337b6f0 ffffffffa019146e
[ 600.454230] ffff88041337bfd8 ffff88041337bfd8 ffff88041337bfd8 00000000000127c0
[ 600.454234] ffff880415581700 ffff880410e98000 ffff88041337b720 ffff8803e3d291a0
[ 600.454238] Call Trace:
[ 600.454252] [] ? buf_hash_find+0x7e/0x100 [zfs]
[ 600.454255] [

@ldonzis
Copy link
Author

ldonzis commented Jan 23, 2015

A little more checking... it appears to be the zfs send operations that are causing the hang. And interestingly, rebooting the destination machine cleared up the problem.

@dweeezil
Copy link
Contributor

Right, the blockage you're seeing is happening because the receiver is, itself, either blocked or is not absorbing the send stream quickly enough. When this happens, you should examine the state of the corresponding zfs receive command on the receive [EDIT] size side with cat /proc/<pid>/stack (for starters).

@dweeezil
Copy link
Contributor

@ldonzis Could you try either the current master code or apply a585f2f from #3348 and see whether it fixes the problem (assuming you can still reproduce it).

@ldonzis
Copy link
Author

ldonzis commented May 6, 2015

Hi, Sorry for the delay.

I’ve tried to reproduce it, but no luck.

I tried updating just with “apt-get dist-upgrade” and it did download and build new dkms, spl, and zfs.

But now when the system boots the entire pool is gone. I can import it and it’s ok, but if I reboot, it’s gone again.

Not that this is the place to report this, but I just wanted to get back to you that I’m not able to test the fix at the moment. But I’ll keep trying.

Thanks,
lew

On Apr 28, 2015, at 11:51 AM, Tim Chase [email protected] wrote:

@ldonzis https://github.com/ldonzis Could you try either the current master code or apply a585f2f a585f2f from #3348 #3348 and see whether it fixes the problem (assuming you can still reproduce it).


Reply to this email directly or view it on GitHub #3033 (comment).

@beren12
Copy link
Contributor

beren12 commented May 6, 2015

Make sure the zfs init scripts are +x and there are symlinks in the correct run levels. This bit me when I replaced the script on my own :-)

@ldonzis
Copy link
Author

ldonzis commented May 6, 2015

I apologize if continuing this thread under the current subject is a bad idea… if you prefer, I could start a new thread for this topic.

But in the mean time…

On May 6, 2015, at 7:30 AM, beren12 [email protected] wrote:

Make sure the zfs init scripts are +x and there are symlinks in the correct run levels. This bit me when I replaced the script on my own :-)

That all looks ok (and I hadn’t replaced any of those scripts), but in the previous version, in /etc/init.d there were only “zfs-mount” and “zfs-share”. In the new version, there is an additional file, “zpool-import”, which is symlinked to upstart.

So, checking /var/log/upstart/zpool-import.log, we have, for each reboot:

/proc/self/fd/9: line 123: cannot create temp file for here-document: Read-only file system

So, on a whim (you can tell I have no idea what I’m doing!), I tried changing /etc/init/zpool-import.conf from this:

while read NAME VALUE
do
...
    done <<-HERE
            $(zdb -C 2>/dev/.initramfs/zdb.stderr)
            :
    HERE

to this:

    echo "$(zdb -C)"$'\n:' | while read NAME VALUE
do
...
done

And now the pool comes up properly on boot.

However, the pipeline causes the “while” to execute in a sub-shell, which might not be desirable, and it looks like /var/run (or /run) is writable at boot time, so I also tried:

    echo "$(zdb -C)"$'\n:’ >/var/run/zpool-import.list
while read NAME VALUE
do
...
    done </var/run/zpool-import.list

and that also works.

And finally, I went back to the original script and just added this one line:

TMPDIR=/var/run # /tmp is not writable during boot!
while read NAME VALUE
...

and that also works. I’m sure someone out there knows the correct way to solve this better than I do.

BTW, I don’t know if this is relevant, but a long time ago, we converted from an LSI RAID controller, where we had put each drive in an individual RAID0 so that ZFS would see each drive individually, to an LSI HBA. At that time, I found that we had to set ZFS_MOUNT=‘yes’ in /etc/default/zfs. I never spent any time finding out why, but it’s been that way for years. The new zpool-import script appears to solve whatever that problem was.

Thanks,
lew

@behlendorf
Copy link
Contributor

Closing this is believed to be resolved in the 0.6.5.x series.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants