Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

0.6.5.2 hangs on large rsync jobs #3867

Closed
rgmiller opened this issue Oct 1, 2015 · 16 comments
Closed

0.6.5.2 hangs on large rsync jobs #3867

rgmiller opened this issue Oct 1, 2015 · 16 comments
Milestone

Comments

@rgmiller
Copy link

rgmiller commented Oct 1, 2015

I'm afraid 0.6.5.2 still has problems with large rsync jobs (from BackupPC). My server isn't completely locked up; I can at least log in and poke around. All 4 cores (the CPU is a quad core Xeon) are sitting at 100%, though. Access to the zpool that was being written to appears to be completely blocked (a simple 'ls' hung) but I can read and write to a separate zpool on the same server.

This may be related to issue #3808. (It's the same computer running the same BackupPC jobs anyway.)

I'll paste the stack traces below.

@rgmiller
Copy link
Author

rgmiller commented Oct 1, 2015

Stack traces for the hung processes (from dmesg):

[23617.581529] INFO: task BackupPC_dump:20068 blocked for more than 120 seconds.
[23617.581549] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[23617.581566] BackupPC_dump D ffff88041e213680 0 20068 3636 0x00000080
[23617.581569] ffff8801fffd7e38 0000000000000082 ffff8801c383ad80 ffff8801fffd7fd8
[23617.581570] ffff8801fffd7fd8 ffff8801fffd7fd8 ffff8801c383ad80 ffff8801c383ad80
[23617.581571] ffff8804045238b8 ffff8804045238c0 ffffffff00000000 ffff8804045238c8
[23617.581573] Call Trace:
[23617.581578] [] schedule+0x29/0x70
[23617.581580] [] rwsem_down_write_failed+0x115/0x220
[23617.581583] [] call_rwsem_down_write_failed+0x13/0x20
[23617.581585] [] ? down_write+0x2d/0x30
[23617.581587] [] vm_munmap+0x33/0x60
[23617.581588] [] SyS_munmap+0x22/0x30
[23617.581590] [] system_call_fastpath+0x16/0x1b
[23737.465775] INFO: task BackupPC_dump:20068 blocked for more than 120 seconds.
[23737.465793] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[23737.465811] BackupPC_dump D ffff88041e213680 0 20068 3636 0x00000080
[23737.465814] ffff8801fffd7e38 0000000000000082 ffff8801c383ad80 ffff8801fffd7fd8
[23737.465815] ffff8801fffd7fd8 ffff8801fffd7fd8 ffff8801c383ad80 ffff8801c383ad80
[23737.465817] ffff8804045238b8 ffff8804045238c0 ffffffff00000000 ffff8804045238c8
[23737.465818] Call Trace:
[23737.465823] [] schedule+0x29/0x70
[23737.465825] [] rwsem_down_write_failed+0x115/0x220
[23737.465828] [] call_rwsem_down_write_failed+0x13/0x20
[23737.465830] [] ? down_write+0x2d/0x30
[23737.465832] [] vm_munmap+0x33/0x60
[23737.465834] [] SyS_munmap+0x22/0x30
[23737.465836] [] system_call_fastpath+0x16/0x1b
[25295.964572] INFO: task BackupPC_dump:20135 blocked for more than 120 seconds.
[25295.964591] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[25295.964609] BackupPC_dump D ffff88041e313680 0 20135 20072 0x00000080
[25295.964612] ffff88015025fb70 0000000000000082 ffff880024778000 ffff88015025ffd8
[25295.964613] ffff88015025ffd8 ffff88015025ffd8 ffff880024778000 ffff8801f308fa88
[25295.964615] ffff8801f308fa8c ffff880024778000 00000000ffffffff ffff8801f308fa90
[25295.964617] Call Trace:
[25295.964622] [] schedule_preempt_disabled+0x29/0x70
[25295.964624] [] __mutex_lock_slowpath+0xc5/0x1c0
[25295.964625] [] mutex_lock+0x1f/0x2f
[25295.964627] [] lookup_slow+0x33/0xa7
[25295.964630] [] link_path_walk+0x80f/0x8b0
[25295.964632] [] ? __d_lookup+0x120/0x160
[25295.964634] [] path_lookupat+0x6b/0x7a0
[25295.964636] [] ? kmem_cache_alloc+0x35/0x1d0
[25295.964637] [] ? getname_flags+0x4f/0x1a0
[25295.964638] [] filename_lookup+0x2b/0xc0
[25295.964640] [] user_path_at_empty+0x67/0xc0
[25295.964643] [] ? from_kgid_munged+0x12/0x20
[25295.964645] [] ? cp_new_stat+0x14f/0x180
[25295.964646] [] user_path_at+0x11/0x20
[25295.964647] [] vfs_fstatat+0x63/0xc0
[25295.964649] [] SYSC_newstat+0x2e/0x60
[25295.964651] [] ? __audit_syscall_exit+0x1e6/0x280
[25295.964652] [] SyS_newstat+0xe/0x10
[25295.964654] [] system_call_fastpath+0x16/0x1b
[25645.496136] perf interrupt took too long (2502 > 2500), lowering kernel.perf_event_max_sample_rate to 50000
[26734.576014] INFO: task spl_dynamic_tas:123 blocked for more than 120 seconds.
[26734.576040] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[26734.576058] spl_dynamic_tas D ffff88041e313680 0 123 2 0x00000000
[26734.576061] ffff88040404bb70 0000000000000046 ffff880404692d80 ffff88040404bfd8
[26734.576063] ffff88040404bfd8 ffff88040404bfd8 ffff880404692d80 ffff88040404bcd8
[26734.576064] ffff88040404bce0 7fffffffffffffff ffff880404692d80 ffff880406b92500
[26734.576066] Call Trace:
[26734.576071] [] schedule+0x29/0x70
[26734.576073] [] schedule_timeout+0x209/0x2d0
[26734.576077] [] ? check_preempt_curr+0x85/0xa0
[26734.576078] [] ? ttwu_do_wakeup+0x19/0xd0
[26734.576080] [] wait_for_completion+0x116/0x170
[26734.576082] [] ? wake_up_state+0x20/0x20
[26734.576101] [] ? taskq_thread_spawn+0x60/0x60 [spl]
[26734.576104] [] kthread_create_on_node+0xa8/0x140
[26734.576107] [] ? string.isra.6+0x3b/0xf0
[26734.576113] [] ? taskq_thread_spawn+0x60/0x60 [spl]
[26734.576120] [] ? taskq_thread_spawn+0x60/0x60 [spl]
[26734.576125] [] spl_kthread_create+0x9c/0xf0 [spl]
[26734.576130] [] taskq_thread_create+0x6b/0x110 [spl]
[26734.576135] [] taskq_thread_spawn_task+0x12/0x30 [spl]
[26734.576139] [] taskq_thread+0x21e/0x420 [spl]
[26734.576141] [] ? wake_up_state+0x20/0x20
[26734.576145] [] ? taskq_thread_spawn+0x60/0x60 [spl]
[26734.576146] [] kthread+0xcf/0xe0
[26734.576148] [] ? kthread_create_on_node+0x140/0x140
[26734.576150] [] ret_from_fork+0x58/0x90
[26734.576151] [] ? kthread_create_on_node+0x140/0x140
[26734.576174] INFO: task kworker/1:0:12006 blocked for more than 120 seconds.
[26734.576190] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[26734.576207] kworker/1:0 D ffff88041e293680 0 12006 2 0x00000080
[26734.576212] ffff88000fcbfbf0 0000000000000046 ffff880185d5ad80 ffff88000fcbffd8
[26734.576213] ffff88000fcbffd8 ffff88000fcbffd8 ffff880185d5ad80 ffff88000fcbfd58
[26734.576214] ffff88000fcbfd60 7fffffffffffffff ffff880185d5ad80 ffff88000fcbfdd8
[26734.576216] Call Trace:
[26734.576218] [] schedule+0x29/0x70
[26734.576220] [] schedule_timeout+0x209/0x2d0
[26734.576221] [] ? sched_clock_cpu+0xb5/0x100
[26734.576223] [] ? check_preempt_curr+0x75/0xa0
[26734.576225] [] ? ttwu_do_wakeup+0x19/0xd0
[26734.576226] [] wait_for_completion+0x116/0x170
[26734.576227] [] ? wake_up_state+0x20/0x20
[26734.576229] [] kthread_create_on_node+0xa8/0x140
[26734.576231] [] ? rescuer_thread+0x400/0x400
[26734.576233] [] create_worker+0xea/0x250
[26734.576234] [] manage_workers.isra.25+0xf6/0x2d0
[26734.576236] [] worker_thread+0x339/0x400
[26734.576237] [] ? rescuer_thread+0x400/0x400
[26734.576238] [] kthread+0xcf/0xe0
[26734.576240] [] ? kthread_create_on_node+0x140/0x140
[26734.576241] [] ret_from_fork+0x58/0x90
[26734.576243] [] ? kthread_create_on_node+0x140/0x140
[30091.338924] INFO: task spl_dynamic_tas:123 blocked for more than 120 seconds.
[30091.338946] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[30091.338965] spl_dynamic_tas D ffff88041e213680 0 123 2 0x00000000
[30091.338967] ffff88040404bb70 0000000000000046 ffff880404692d80 ffff88040404bfd8
[30091.338969] ffff88040404bfd8 ffff88040404bfd8 ffff880404692d80 ffff88040404bcd8
[30091.338971] ffff88040404bce0 7fffffffffffffff ffff880404692d80 ffff880406b92500
[30091.338972] Call Trace:
[30091.338977] [] schedule+0x29/0x70
[30091.338980] [] schedule_timeout+0x209/0x2d0
[30091.338982] [] ? sched_clock_cpu+0xb5/0x100
[30091.338985] [] ? check_preempt_curr+0x75/0xa0
[30091.338987] [] ? ttwu_do_wakeup+0x19/0xd0
[30091.338989] [] wait_for_completion+0x116/0x170
[30091.338990] [] ? wake_up_state+0x20/0x20
[30091.339010] [] ? taskq_thread_spawn+0x60/0x60 [spl]
[30091.339013] [] kthread_create_on_node+0xa8/0x140
[30091.339016] [] ? string.isra.6+0x3b/0xf0
[30091.339022] [] ? taskq_thread_spawn+0x60/0x60 [spl]
[30091.339029] [] ? taskq_thread_spawn+0x60/0x60 [spl]
[30091.339035] [] spl_kthread_create+0x9c/0xf0 [spl]
[30091.339041] [] taskq_thread_create+0x6b/0x110 [spl]
[30091.339046] [] taskq_thread_spawn_task+0x12/0x30 [spl]
[30091.339052] [] taskq_thread+0x21e/0x420 [spl]
[30091.339053] [] ? wake_up_state+0x20/0x20
[30091.339059] [] ? taskq_thread_spawn+0x60/0x60 [spl]
[30091.339060] [] kthread+0xcf/0xe0
[30091.339062] [] ? kthread_create_on_node+0x140/0x140
[30091.339064] [] ret_from_fork+0x58/0x90
[30091.339066] [] ? kthread_create_on_node+0x140/0x140
[30211.223201] INFO: task spl_dynamic_tas:123 blocked for more than 120 seconds.
[30211.223220] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[30211.223239] spl_dynamic_tas D ffff88041e213680 0 123 2 0x00000000
[30211.223242] ffff88040404bb70 0000000000000046 ffff880404692d80 ffff88040404bfd8
[30211.223244] ffff88040404bfd8 ffff88040404bfd8 ffff880404692d80 ffff88040404bcd8
[30211.223245] ffff88040404bce0 7fffffffffffffff ffff880404692d80 ffff880406b92500
[30211.223247] Call Trace:
[30211.223252] [] schedule+0x29/0x70
[30211.223254] [] schedule_timeout+0x209/0x2d0
[30211.223257] [] ? sched_clock_cpu+0xb5/0x100
[30211.223259] [] ? check_preempt_curr+0x75/0xa0
[30211.223261] [] ? ttwu_do_wakeup+0x19/0xd0
[30211.223263] [] wait_for_completion+0x116/0x170
[30211.223264] [] ? wake_up_state+0x20/0x20
[30211.223283] [] ? taskq_thread_spawn+0x60/0x60 [spl]
[30211.223285] [] kthread_create_on_node+0xa8/0x140
[30211.223288] [] ? string.isra.6+0x3b/0xf0
[30211.223293] [] ? taskq_thread_spawn+0x60/0x60 [spl]
[30211.223298] [] ? taskq_thread_spawn+0x60/0x60 [spl]
[30211.223303] [] spl_kthread_create+0x9c/0xf0 [spl]
[30211.223307] [] taskq_thread_create+0x6b/0x110 [spl]
[30211.223312] [] taskq_thread_spawn_task+0x12/0x30 [spl]
[30211.223317] [] taskq_thread+0x21e/0x420 [spl]
[30211.223318] [] ? wake_up_state+0x20/0x20
[30211.223322] [] ? taskq_thread_spawn+0x60/0x60 [spl]
[30211.223324] [] kthread+0xcf/0xe0
[30211.223325] [] ? kthread_create_on_node+0x140/0x140
[30211.223328] [] ret_from_fork+0x58/0x90
[30211.223329] [] ? kthread_create_on_node+0x140/0x140
[30331.108219] INFO: task spl_dynamic_tas:123 blocked for more than 120 seconds.
[30331.108240] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[30331.108257] spl_dynamic_tas D ffff88041e213680 0 123 2 0x00000000
[30331.108260] ffff88040404bb70 0000000000000046 ffff880404692d80 ffff88040404bfd8
[30331.108262] ffff88040404bfd8 ffff88040404bfd8 ffff880404692d80 ffff88040404bcd8
[30331.108263] ffff88040404bce0 7fffffffffffffff ffff880404692d80 ffff880406b92500
[30331.108265] Call Trace:
[30331.108270] [] schedule+0x29/0x70
[30331.108273] [] schedule_timeout+0x209/0x2d0
[30331.108275] [] ? sched_clock_cpu+0xb5/0x100
[30331.108278] [] ? check_preempt_curr+0x75/0xa0
[30331.108279] [] ? ttwu_do_wakeup+0x19/0xd0
[30331.108281] [] wait_for_completion+0x116/0x170
[30331.108283] [] ? wake_up_state+0x20/0x20
[30331.108304] [] ? taskq_thread_spawn+0x60/0x60 [spl]
[30331.108306] [] kthread_create_on_node+0xa8/0x140
[30331.108310] [] ? string.isra.6+0x3b/0xf0
[30331.108316] [] ? taskq_thread_spawn+0x60/0x60 [spl]
[30331.108323] [] ? taskq_thread_spawn+0x60/0x60 [spl]
[30331.108329] [] spl_kthread_create+0x9c/0xf0 [spl]
[30331.108333] [] taskq_thread_create+0x6b/0x110 [spl]
[30331.108338] [] taskq_thread_spawn_task+0x12/0x30 [spl]
[30331.108343] [] taskq_thread+0x21e/0x420 [spl]
[30331.108344] [] ? wake_up_state+0x20/0x20
[30331.108349] [] ? taskq_thread_spawn+0x60/0x60 [spl]
[30331.108350] [] kthread+0xcf/0xe0
[30331.108352] [] ? kthread_create_on_node+0x140/0x140
[30331.108354] [] ret_from_fork+0x58/0x90
[30331.108355] [] ? kthread_create_on_node+0x140/0x140
[36085.587524] INFO: task kswapd0:46 blocked for more than 120 seconds.
[36085.587544] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[36085.587562] kswapd0 D ffff88041e213680 0 46 2 0x00000000
[36085.587565] ffff88040445b790 0000000000000046 ffff8804083fdb00 ffff88040445bfd8
[36085.587567] ffff88040445bfd8 ffff88040445bfd8 ffff8804083fdb00 ffff88040194a9f0
[36085.587568] ffff88040194a9f4 ffff8804083fdb00 00000000ffffffff ffff88040194a9f8
[36085.587570] Call Trace:
[36085.587576] [] schedule_preempt_disabled+0x29/0x70
[36085.587578] [] __mutex_lock_slowpath+0xc5/0x1c0
[36085.587580] [] mutex_lock+0x1f/0x2f
[36085.587616] [] arc_release+0x301/0x490 [zfs]
[36085.587633] [] dbuf_dirty+0x7c0/0x980 [zfs]
[36085.587646] [] dmu_buf_will_dirty+0x56/0xa0 [zfs]
[36085.587669] [] zap_get_leaf_byblk+0x132/0x2c0 [zfs]
[36085.587681] [] ? dmu_buf_rele+0xe/0x10 [zfs]
[36085.587700] [] ? zap_idx_to_blk+0x103/0x180 [zfs]
[36085.587718] [] zap_deref_leaf+0x7a/0xa0 [zfs]
[36085.587734] [] fzap_remove+0x3f/0xb0 [zfs]
[36085.587751] [] ? zap_name_alloc+0x73/0xd0 [zfs]
[36085.587768] [] zap_remove_norm+0x17b/0x1e0 [zfs]
[36085.587783] [] zap_remove+0x13/0x20 [zfs]
[36085.587800] [] zap_remove_int+0x54/0x80 [zfs]
[36085.587816] [] zfs_rmnode+0x224/0x350 [zfs]
[36085.587819] [] ? mutex_lock+0x12/0x2f
[36085.587836] [] zfs_zinactive+0x168/0x180 [zfs]
[36085.587852] [] zfs_inactive+0x67/0x240 [zfs]
[36085.587855] [] ? truncate_pagecache+0x59/0x60
[36085.587871] [] zpl_evict_inode+0x43/0x60 [zfs]
[36085.587875] [] evict+0xa7/0x170
[36085.587877] [] dispose_list+0x3e/0x50
[36085.587878] [] prune_icache_sb+0x163/0x320
[36085.587880] [] prune_super+0x16b/0x1a0
[36085.587882] [] shrink_slab+0x165/0x300
[36085.587884] [] ? vmpressure+0x21/0x90
[36085.587886] [] balance_pgdat+0x4b1/0x5e0
[36085.587887] [] kswapd+0x173/0x450
[36085.587890] [] ? wake_up_bit+0x30/0x30
[36085.587892] [] ? balance_pgdat+0x5e0/0x5e0
[36085.587893] [] kthread+0xcf/0xe0
[36085.587894] [] ? kthread_create_on_node+0x140/0x140
[36085.587897] [] ret_from_fork+0x58/0x90
[36085.587898] [] ? kthread_create_on_node+0x140/0x140
[36085.587912] INFO: task txg_quiesce:1089 blocked for more than 120 seconds.
[36085.587929] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[36085.587946] txg_quiesce D ffff88041e213680 0 1089 2 0x00000000
[36085.587948] ffff8803fcbe7d38 0000000000000046 ffff880406cde660 ffff8803fcbe7fd8
[36085.587949] ffff8803fcbe7fd8 ffff8803fcbe7fd8 ffff880406cde660 ffff8803fcc62148
[36085.587951] ffff8803fcc62038 ffff8803fcc62150 ffff8803fcc62060 0000000000000000
[36085.587952] Call Trace:
[36085.587954] [] schedule+0x29/0x70
[36085.587961] [] cv_wait_common+0x125/0x150 [spl]
[36085.587963] [] ? wake_up_bit+0x30/0x30
[36085.587968] [] __cv_wait+0x15/0x20 [spl]
[36085.587984] [] txg_quiesce_thread+0x2eb/0x400 [zfs]
[36085.588000] [] ? txg_init+0x2b0/0x2b0 [zfs]
[36085.588005] [] thread_generic_wrapper+0x71/0x80 [spl]
[36085.588009] [] ? __thread_exit+0x20/0x20 [spl]
[36085.588010] [] kthread+0xcf/0xe0
[36085.588012] [] ? kthread_create_on_node+0x140/0x140
[36085.588014] [] ret_from_fork+0x58/0x90
[36085.588015] [] ? kthread_create_on_node+0x140/0x140

@behlendorf
Copy link
Contributor

@rgmiller sorry to hear that. It appears that your still able to tickle a case where dynamically creating threads can wedge the system when there's significant memory pressure. Please set spl_taskq_thread_dynamic=0 and let us know if that resolves the issue.

@behlendorf behlendorf added this to the 0.7.0 milestone Oct 2, 2015
@rgmiller
Copy link
Author

rgmiller commented Oct 3, 2015

OK. I've set spl_taskq_thread_dynamic to 0. I'll post a follow-up after the server has run for a couple of days (or if it hangs).

@rgmiller
Copy link
Author

rgmiller commented Oct 5, 2015

OK, the server has been running without major issues for 2 days now, so it appears that setting spl_taskq_thread_dynamic=0 is a reasonable workaround for now.

FWIW: things aren't perfect. I've got some anecdotal evidence that other processes can become starved for I/O when the BackupPC jobs are running. Specifically, I have a streaming music server on that machine that reads from a different zpool. I don't have any stack traces because nothing crashed, but it was definitely stuck in an I/O Wait state long enough for the individual players to play through their internal buffers and then stop. Both times this happened, it was right after a BackupPC job started.

@behlendorf
Copy link
Contributor

@rgmiller thanks for following up, this is exactly the kind of feedback we need to get. My suggestion would be to stick with spl_taskq_thread_dynamic=0 for now, it may become the default value in a future point release if we can't resolve the core issue with creating threads on demand your seeing.

@mihaivint
Copy link

Not sure if this helps or not but initial problem was also linked to my issue 3834. I'm still on 0.6.5.1 and managed to work around it by setting vm.min_free_kbytes to 1048576. Didn't try a lower value.

@rgmiller
Copy link
Author

Just a quick follow-up: zfs locked up again tonight. I'm actually running 0.6.5.3 at this point (with spl_taskq_thread_dynamic set to 0). It had been running find since I set spl_taskq_thread_dynamic manually after @behlendorf suggested it. So, it appears that we've still got issues; they're just much less likely. dmesg text pasted below.

[430985.003872] INFO: task z_wr_iss:2300 blocked for more than 120 seconds.
[430985.003893] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[430985.003912] z_wr_iss D ffff88041e213680 0 2300 2 0x00000000
[430985.003915] ffff88040035baf0 0000000000000046 ffff88040705cfa0 ffff88040035bfd8
[430985.003917] ffff88040035bfd8 ffff88040035bfd8 ffff88040705cfa0 ffff8803f99d09d0
[430985.003918] ffff8803f99d09d4 ffff88040705cfa0 00000000ffffffff ffff8803f99d09d8
[430985.003920] Call Trace:
[430985.003925] [] schedule_preempt_disabled+0x29/0x70
[430985.003927] [] __mutex_lock_slowpath+0xc5/0x1c0
[430985.003929] [] mutex_lock+0x1f/0x2f
[430985.003969] [] vdev_queue_io+0x88/0x200 [zfs]
[430985.003992] [] ? zio_create+0x3fa/0x500 [zfs]
[430985.004012] [] zio_vdev_io_start+0x187/0x2e0 [zfs]
[430985.004031] [] zio_nowait+0xc6/0x1b0 [zfs]
[430985.004051] [] vdev_mirror_io_start+0xa7/0x1a0 [zfs]
[430985.004069] [] ? vdev_config_sync+0x140/0x140 [zfs]
[430985.004087] [] ? zio_push_transform+0x39/0x90 [zfs]
[430985.004107] [] zio_vdev_io_start+0x9f/0x2e0 [zfs]
[430985.004123] [] zio_nowait+0xc6/0x1b0 [zfs]
[430985.004141] [] vdev_mirror_io_start+0xa7/0x1a0 [zfs]
[430985.004157] [] ? vdev_config_sync+0x140/0x140 [zfs]
[430985.004174] [] zio_vdev_io_start+0x1dd/0x2e0 [zfs]
[430985.004189] [] zio_execute+0xc8/0x180 [zfs]
[430985.004196] [] taskq_thread+0x21e/0x420 [spl]
[430985.004199] [] ? wake_up_state+0x20/0x20
[430985.004203] [] ? taskq_thread_spawn+0x60/0x60 [spl]
[430985.004205] [] kthread+0xcf/0xe0
[430985.004206] [] ? kthread_create_on_node+0x140/0x140
[430985.004209] [] ret_from_fork+0x58/0x90
[430985.004210] [] ? kthread_create_on_node+0x140/0x140
[430985.004212] INFO: task z_wr_iss:2302 blocked for more than 120 seconds.
[430985.004228] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[430985.004246] z_wr_iss D ffff88041e293680 0 2302 2 0x00000000
[430985.004248] ffff88040034baf0 0000000000000046 ffff88040705f1c0 ffff88040034bfd8
[430985.004250] ffff88040034bfd8 ffff88040034bfd8 ffff88040705f1c0 ffff8803f99d09d0
[430985.004251] ffff8803f99d09d4 ffff88040705f1c0 00000000ffffffff ffff8803f99d09d8
[430985.004252] Call Trace:
[430985.004255] [] schedule_preempt_disabled+0x29/0x70
[430985.004256] [] __mutex_lock_slowpath+0xc5/0x1c0
[430985.004258] [] mutex_lock+0x1f/0x2f
[430985.004274] [] vdev_queue_io+0x88/0x200 [zfs]
[430985.004290] [] ? zio_create+0x3fa/0x500 [zfs]
[430985.004305] [] zio_vdev_io_start+0x187/0x2e0 [zfs]
[430985.004319] [] zio_nowait+0xc6/0x1b0 [zfs]
[430985.004335] [] vdev_mirror_io_start+0xa7/0x1a0 [zfs]
[430985.004350] [] ? vdev_config_sync+0x140/0x140 [zfs]
[430985.004365] [] ? zio_push_transform+0x39/0x90 [zfs]
[430985.004379] [] zio_vdev_io_start+0x9f/0x2e0 [zfs]
[430985.004393] [] zio_nowait+0xc6/0x1b0 [zfs]
[430985.004408] [] vdev_mirror_io_start+0xa7/0x1a0 [zfs]
[430985.004423] [] ? vdev_config_sync+0x140/0x140 [zfs]
[430985.004438] [] zio_vdev_io_start+0x1dd/0x2e0 [zfs]
[430985.004452] [] zio_execute+0xc8/0x180 [zfs]
[430985.004456] [] taskq_thread+0x21e/0x420 [spl]
[430985.004458] [] ? wake_up_state+0x20/0x20
[430985.004462] [] ? taskq_thread_spawn+0x60/0x60 [spl]
[430985.004464] [] kthread+0xcf/0xe0
[430985.004465] [] ? kthread_create_on_node+0x140/0x140
[430985.004467] [] ret_from_fork+0x58/0x90
[430985.004468] [] ? kthread_create_on_node+0x140/0x140
[430985.004485] INFO: task txg_sync:2586 blocked for more than 120 seconds.
[430985.004501] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[430985.004519] txg_sync D ffff88041e293680 0 2586 2 0x00000000
[430985.004520] ffff8803f8e97b90 0000000000000046 ffff8804006fa220 ffff8803f8e97fd8
[430985.004521] ffff8803f8e97fd8 ffff8803f8e97fd8 ffff8804006fa220 ffff88041e293f48
[430985.004523] ffff880366f97088 ffff880366f970c8 ffff880366f970b0 0000000000000001
[430985.004524] Call Trace:
[430985.004526] [] io_schedule+0x9d/0x130
[430985.004531] [] cv_wait_common+0xae/0x150 [spl]
[430985.004533] [] ? wake_up_bit+0x30/0x30
[430985.004538] [] __cv_wait_io+0x18/0x20 [spl]
[430985.004552] [] zio_wait+0x123/0x210 [zfs]
[430985.004566] [] dsl_pool_sync+0xbf/0x420 [zfs]
[430985.004582] [] spa_sync+0x388/0xb70 [zfs]
[430985.004583] [] ? autoremove_wake_function+0x2b/0x40
[430985.004600] [] txg_sync_thread+0x3cc/0x640 [zfs]
[430985.004615] [] ? txg_fini+0x2a0/0x2a0 [zfs]
[430985.004620] [] thread_generic_wrapper+0x71/0x80 [spl]
[430985.004624] [] ? __thread_exit+0x20/0x20 [spl]
[430985.004625] [] kthread+0xcf/0xe0
[430985.004627] [] ? kthread_create_on_node+0x140/0x140
[430985.004629] [] ret_from_fork+0x58/0x90
[430985.004630] [] ? kthread_create_on_node+0x140/0x140
[430985.004658] INFO: task BackupPC_trashC:5245 blocked for more than 120 seconds.
[430985.004675] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[430985.004693] BackupPC_trashC D ffff88041e393680 0 5245 5241 0x00000080
[430985.004695] ffff8803db4f3530 0000000000000082 ffff8803f98bc440 ffff8803db4f3fd8
[430985.004696] ffff8803db4f3fd8 ffff8803db4f3fd8 ffff8803f98bc440 ffff8803f99d59d0
[430985.004697] ffff8803f99d59d4 ffff8803f98bc440 00000000ffffffff ffff8803f99d59d8
[430985.004699] Call Trace:
[430985.004701] [] schedule_preempt_disabled+0x29/0x70
[430985.004702] [] __mutex_lock_slowpath+0xc5/0x1c0
[430985.004704] [] mutex_lock+0x1f/0x2f
[430985.004720] [] vdev_queue_io+0x88/0x200 [zfs]
[430985.004736] [] zio_vdev_io_start+0x187/0x2e0 [zfs]
[430985.004752] [] zio_nowait+0xc6/0x1b0 [zfs]
[430985.004767] [] vdev_mirror_io_start+0xa7/0x1a0 [zfs]
[430985.004782] [] ? vdev_config_sync+0x140/0x140 [zfs]
[430985.004798] [] ? zio_push_transform+0x39/0x90 [zfs]
[430985.004813] [] zio_vdev_io_start+0x9f/0x2e0 [zfs]
[430985.004828] [] zio_nowait+0xc6/0x1b0 [zfs]
[430985.004843] [] vdev_mirror_io_start+0xa7/0x1a0 [zfs]
[430985.004858] [] ? vdev_config_sync+0x140/0x140 [zfs]
[430985.004872] [] zio_vdev_io_start+0x1dd/0x2e0 [zfs]
[430985.004887] [] zio_nowait+0xc6/0x1b0 [zfs]
[430985.004896] [] arc_read+0x603/0xad0 [zfs]
[430985.004907] [] dbuf_read+0x2ad/0x930 [zfs]
[430985.004919] [] dmu_buf_hold+0x50/0x80 [zfs]
[430985.004934] [] zap_get_leaf_byblk+0x5c/0x2c0 [zfs]
[430985.004944] [] ? dmu_buf_rele+0xe/0x10 [zfs]
[430985.004958] [] ? zap_idx_to_blk+0x103/0x180 [zfs]
[430985.004973] [] zap_deref_leaf+0x7a/0xa0 [zfs]
[430985.004988] [] fzap_add_cd+0x55/0x140 [zfs]
[430985.004993] [] ? spl_kmem_alloc+0xc7/0x170 [spl]
[430985.005007] [] fzap_add+0x6a/0x80 [zfs]
[430985.005022] [] zap_add+0x126/0x180 [zfs]
[430985.005037] [] zap_add_int+0x68/0x90 [zfs]
[430985.005053] [] zfs_unlinked_add+0x3c/0x90 [zfs]
[430985.005067] [] zfs_link_destroy+0x54d/0x5a0 [zfs]
[430985.005083] [] zfs_rmdir+0x499/0x650 [zfs]
[430985.005098] [] zpl_rmdir+0x56/0x80 [zfs]
[430985.005103] [] vfs_rmdir+0xa8/0x100
[430985.005105] [] do_rmdir+0x1a5/0x200
[430985.005107] [] ? mntput_no_expire+0x3e/0x120
[430985.005109] [] SyS_rmdir+0x16/0x20
[430985.005111] [] system_call_fastpath+0x16/0x1b
[430985.005117] INFO: task BackupPC_dump:5147 blocked for more than 120 seconds.
[430985.005133] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[430985.005151] BackupPC_dump D ffff88041e313680 0 5147 5082 0x00000080
[430985.005153] ffff880070aff670 0000000000000082 ffff8800c3814fa0 ffff880070afffd8
[430985.005154] ffff880070afffd8 ffff880070afffd8 ffff8800c3814fa0 ffff8803f99d09d0
[430985.005156] ffff8803f99d09d4 ffff8800c3814fa0 00000000ffffffff ffff8803f99d09d8
[430985.005157] Call Trace:
[430985.005159] [] schedule_preempt_disabled+0x29/0x70
[430985.005161] [] __mutex_lock_slowpath+0xc5/0x1c0
[430985.005162] [] mutex_lock+0x1f/0x2f
[430985.005178] [] vdev_queue_io+0x88/0x200 [zfs]
[430985.005193] [] ? zio_create+0x3fa/0x500 [zfs]
[430985.005208] [] zio_vdev_io_start+0x187/0x2e0 [zfs]
[430985.005223] [] zio_nowait+0xc6/0x1b0 [zfs]
[430985.005238] [] vdev_mirror_io_start+0xa7/0x1a0 [zfs]
[430985.005253] [] ? vdev_config_sync+0x140/0x140 [zfs]
[430985.005268] [] ? zio_push_transform+0x39/0x90 [zfs]
[430985.005283] [] zio_vdev_io_start+0x9f/0x2e0 [zfs]
[430985.005297] [] zio_nowait+0xc6/0x1b0 [zfs]
[430985.005312] [] vdev_mirror_io_start+0xa7/0x1a0 [zfs]
[430985.005326] [] ? vdev_config_sync+0x140/0x140 [zfs]
[430985.005341] [] zio_vdev_io_start+0x1dd/0x2e0 [zfs]
[430985.005363] [] zio_nowait+0xc6/0x1b0 [zfs]
[430985.005372] [] arc_read+0x603/0xad0 [zfs]
[430985.005383] [] dbuf_prefetch+0x1de/0x2e0 [zfs]
[430985.005394] [] dmu_zfetch_dofetch.isra.6+0x114/0x180 [zfs]
[430985.005406] [] dmu_zfetch+0x4ef/0xe00 [zfs]
[430985.005415] [] dbuf_read+0x3f7/0x930 [zfs]
[430985.005425] [] dmu_buf_hold_array_by_dnode+0x123/0x4d0 [zfs]
[430985.005437] [] dmu_read_uio_dnode+0x52/0x100 [zfs]
[430985.005447] [] dmu_read_uio_dbuf+0x47/0x60 [zfs]
[430985.005462] [] zfs_read+0x13d/0x400 [zfs]
[430985.005477] [] zpl_read_common_iovec.constprop.10+0x80/0xd0 [zfs]
[430985.005491] [] zpl_read+0x7e/0xb0 [zfs]
[430985.005493] [] vfs_read+0x9c/0x170
[430985.005495] [] SyS_read+0x58/0xb0
[430985.005497] [] system_call_fastpath+0x16/0x1b
[431104.889511] INFO: task txg_sync:2586 blocked for more than 120 seconds.
[431104.889530] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[431104.889549] txg_sync D ffff88041e293680 0 2586 2 0x00000000
[431104.889552] ffff8803f8e97b90 0000000000000046 ffff8804006fa220 ffff8803f8e97fd8
[431104.889554] ffff8803f8e97fd8 ffff8803f8e97fd8 ffff8804006fa220 ffff88041e293f48
[431104.889555] ffff880366f97088 ffff880366f970c8 ffff880366f970b0 0000000000000001
[431104.889557] Call Trace:
[431104.889562] [] io_schedule+0x9d/0x130
[431104.889581] [] cv_wait_common+0xae/0x150 [spl]
[431104.889584] [] ? wake_up_bit+0x30/0x30
[431104.889590] [] __cv_wait_io+0x18/0x20 [spl]
[431104.889617] [] zio_wait+0x123/0x210 [zfs]
[431104.889635] [] dsl_pool_sync+0xbf/0x420 [zfs]
[431104.889654] [] spa_sync+0x388/0xb70 [zfs]
[431104.889656] [] ? autoremove_wake_function+0x2b/0x40
[431104.889675] [] txg_sync_thread+0x3cc/0x640 [zfs]
[431104.889692] [] ? txg_fini+0x2a0/0x2a0 [zfs]
[431104.889697] [] thread_generic_wrapper+0x71/0x80 [spl]
[431104.889702] [] ? __thread_exit+0x20/0x20 [spl]
[431104.889703] [] kthread+0xcf/0xe0
[431104.889705] [] ? kthread_create_on_node+0x140/0x140
[431104.889707] [] ret_from_fork+0x58/0x90
[431104.889709] [] ? kthread_create_on_node+0x140/0x140
[431224.773846] INFO: task spl_kmem_cache:115 blocked for more than 120 seconds.
[431224.773866] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[431224.773885] spl_kmem_cache D ffff88041e213680 0 115 2 0x00000000
[431224.773887] ffff8800369636b0 0000000000000046 ffff880036958000 ffff880036963fd8
[431224.773889] ffff880036963fd8 ffff880036963fd8 ffff880036958000 ffff8800c7d8ab68
[431224.773891] ffff8800c7d8aa20 ffff8800c7d8ab70 ffff8800c7d8aa48 0000000000000000
[431224.773892] Call Trace:
[431224.773898] [] schedule+0x29/0x70
[431224.773917] [] cv_wait_common+0x125/0x150 [spl]
[431224.773921] [] ? wake_up_bit+0x30/0x30
[431224.773926] [] __cv_wait+0x15/0x20 [spl]
[431224.773953] [] txg_wait_open+0xc3/0x110 [zfs]
[431224.773969] [] dmu_tx_wait+0x3a8/0x3c0 [zfs]
[431224.773982] [] dmu_tx_assign+0x9a/0x510 [zfs]
[431224.774002] [] zfs_inactive+0xda/0x240 [zfs]
[431224.774004] [] ? truncate_pagecache+0x59/0x60
[431224.774022] [] zpl_evict_inode+0x43/0x60 [zfs]
[431224.774027] [] evict+0xa7/0x170
[431224.774029] [] dispose_list+0x3e/0x50
[431224.774030] [] prune_icache_sb+0x163/0x320
[431224.774032] [] prune_super+0x16b/0x1a0
[431224.774034] [] shrink_slab+0x165/0x300
[431224.774036] [] ? vmpressure+0x21/0x90
[431224.774038] [] do_try_to_free_pages+0x3c2/0x4e0
[431224.774039] [] try_to_free_pages+0xfc/0x180
[431224.774042] [] __alloc_pages_nodemask+0x7fd/0xb90
[431224.774045] [] alloc_pages_current+0xa9/0x170
[431224.774047] [] __vmalloc_node_range+0x15b/0x270
[431224.774052] [] ? spl_vmalloc+0x34/0x60 [spl]
[431224.774054] [] __vmalloc+0x41/0x50
[431224.774058] [] ? spl_vmalloc+0x34/0x60 [spl]
[431224.774062] [] spl_vmalloc+0x34/0x60 [spl]
[431224.774067] [] kv_alloc.isra.5+0x87/0x90 [spl]
[431224.774071] [] spl_cache_grow_work+0x4f/0x2d0 [spl]
[431224.774075] [] taskq_thread+0x21e/0x420 [spl]
[431224.774077] [] ? wake_up_state+0x20/0x20
[431224.774082] [] ? taskq_thread_spawn+0x60/0x60 [spl]
[431224.774084] [] kthread+0xcf/0xe0
[431224.774085] [] ? kthread_create_on_node+0x140/0x140
[431224.774088] [] ret_from_fork+0x58/0x90
[431224.774089] [] ? kthread_create_on_node+0x140/0x140
[431224.774090] INFO: task spl_kmem_cache:117 blocked for more than 120 seconds.
[431224.774108] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[431224.774126] spl_kmem_cache D ffff88041e313680 0 117 2 0x00000000
[431224.774128] ffff88003696b6b0 0000000000000046 ffff8800369596c0 ffff88003696bfd8
[431224.774129] ffff88003696bfd8 ffff88003696bfd8 ffff8800369596c0 ffff8800c7d8ab68
[431224.774130] ffff8800c7d8aa20 ffff8800c7d8ab70 ffff8800c7d8aa48 0000000000000000
[431224.774132] Call Trace:
[431224.774134] [] schedule+0x29/0x70
[431224.774139] [] cv_wait_common+0x125/0x150 [spl]
[431224.774141] [] ? wake_up_bit+0x30/0x30
[431224.774145] [] __cv_wait+0x15/0x20 [spl]
[431224.774163] [] txg_wait_open+0xc3/0x110 [zfs]
[431224.774176] [] dmu_tx_wait+0x3a8/0x3c0 [zfs]
[431224.774189] [] dmu_tx_assign+0x9a/0x510 [zfs]
[431224.774206] [] zfs_inactive+0xda/0x240 [zfs]
[431224.774208] [] ? truncate_pagecache+0x59/0x60
[431224.774224] [] zpl_evict_inode+0x43/0x60 [zfs]
[431224.774227] [] evict+0xa7/0x170
[431224.774229] [] dispose_list+0x3e/0x50
[431224.774231] [] prune_icache_sb+0x163/0x320
[431224.774232] [] prune_super+0x16b/0x1a0
[431224.774234] [] shrink_slab+0x165/0x300
[431224.774235] [] ? vmpressure+0x21/0x90
[431224.774237] [] do_try_to_free_pages+0x3c2/0x4e0
[431224.774239] [] try_to_free_pages+0xfc/0x180
[431224.774241] [] __alloc_pages_nodemask+0x7fd/0xb90
[431224.774243] [] alloc_pages_current+0xa9/0x170
[431224.774245] [] __vmalloc_node_range+0x15b/0x270
[431224.774249] [] ? spl_vmalloc+0x34/0x60 [spl]
[431224.774251] [] __vmalloc+0x41/0x50
[431224.774255] [] ? spl_vmalloc+0x34/0x60 [spl]
[431224.774259] [] spl_vmalloc+0x34/0x60 [spl]
[431224.774264] [] kv_alloc.isra.5+0x87/0x90 [spl]
[431224.774268] [] spl_cache_grow_work+0x4f/0x2d0 [spl]
[431224.774272] [] taskq_thread+0x21e/0x420 [spl]
[431224.774274] [] ? wake_up_state+0x20/0x20
[431224.774278] [] ? taskq_thread_spawn+0x60/0x60 [spl]
[431224.774280] [] kthread+0xcf/0xe0
[431224.774281] [] ? kthread_create_on_node+0x140/0x140
[431224.774283] [] ret_from_fork+0x58/0x90
[431224.774284] [] ? kthread_create_on_node+0x140/0x140
[431224.774305] INFO: task z_wr_iss:2301 blocked for more than 120 seconds.
[431224.774321] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[431224.774340] z_wr_iss D ffff88041e393680 0 2301 2 0x00000000
[431224.774341] ffff88040035faf0 0000000000000046 ffff88040705ad80 ffff88040035ffd8
[431224.774342] ffff88040035ffd8 ffff88040035ffd8 ffff88040705ad80 ffff8803f99d09d0
[431224.774344] ffff8803f99d09d4 ffff88040705ad80 00000000ffffffff ffff8803f99d09d8
[431224.774345] Call Trace:
[431224.774347] [] schedule_preempt_disabled+0x29/0x70
[431224.774349] [] __mutex_lock_slowpath+0xc5/0x1c0
[431224.774351] [] mutex_lock+0x1f/0x2f
[431224.774368] [] vdev_queue_io+0x88/0x200 [zfs]
[431224.774385] [] ? zio_create+0x3fa/0x500 [zfs]
[431224.774401] [] zio_vdev_io_start+0x187/0x2e0 [zfs]
[431224.774417] [] zio_nowait+0xc6/0x1b0 [zfs]
[431224.774434] [] vdev_mirror_io_start+0xa7/0x1a0 [zfs]
[431224.774450] [] ? vdev_config_sync+0x140/0x140 [zfs]
[431224.774465] [] ? zio_push_transform+0x39/0x90 [zfs]
[431224.774484] [] zio_vdev_io_start+0x9f/0x2e0 [zfs]
[431224.774503] [] zio_nowait+0xc6/0x1b0 [zfs]
[431224.774522] [] vdev_mirror_io_start+0xa7/0x1a0 [zfs]
[431224.774541] [] ? vdev_config_sync+0x140/0x140 [zfs]
[431224.774560] [] zio_vdev_io_start+0x1dd/0x2e0 [zfs]
[431224.774579] [] zio_execute+0xc8/0x180 [zfs]
[431224.774584] [] taskq_thread+0x21e/0x420 [spl]
[431224.774586] [] ? wake_up_state+0x20/0x20
[431224.774591] [] ? taskq_thread_spawn+0x60/0x60 [spl]
[431224.774592] [] kthread+0xcf/0xe0
[431224.774594] [] ? kthread_create_on_node+0x140/0x140
[431224.774596] [] ret_from_fork+0x58/0x90
[431224.774598] [] ? kthread_create_on_node+0x140/0x140
[431224.774613] INFO: task txg_sync:2586 blocked for more than 120 seconds.
[431224.774633] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[431224.774656] txg_sync D ffff88041e293680 0 2586 2 0x00000000
[431224.774658] ffff8803f8e97b90 0000000000000046 ffff8804006fa220 ffff8803f8e97fd8
[431224.774659] ffff8803f8e97fd8 ffff8803f8e97fd8 ffff8804006fa220 ffff88041e293f48
[431224.774661] ffff880366f97088 ffff880366f970c8 ffff880366f970b0 0000000000000001
[431224.774663] Call Trace:
[431224.774665] [] io_schedule+0x9d/0x130
[431224.774670] [] cv_wait_common+0xae/0x150 [spl]
[431224.774672] [] ? wake_up_bit+0x30/0x30
[431224.774677] [] __cv_wait_io+0x18/0x20 [spl]
[431224.774696] [] zio_wait+0x123/0x210 [zfs]
[431224.774713] [] dsl_pool_sync+0xbf/0x420 [zfs]
[431224.774732] [] spa_sync+0x388/0xb70 [zfs]
[431224.774734] [] ? autoremove_wake_function+0x2b/0x40
[431224.774754] [] txg_sync_thread+0x3cc/0x640 [zfs]
[431224.774773] [] ? txg_fini+0x2a0/0x2a0 [zfs]
[431224.774779] [] thread_generic_wrapper+0x71/0x80 [spl]
[431224.774784] [] ? __thread_exit+0x20/0x20 [spl]
[431224.774785] [] kthread+0xcf/0xe0
[431224.774787] [] ? kthread_create_on_node+0x140/0x140
[431224.774789] [] ret_from_fork+0x58/0x90
[431224.774790] [] ? kthread_create_on_node+0x140/0x140

@rgmiller rgmiller reopened this Oct 26, 2015
@koptein koptein mentioned this issue Oct 30, 2015
@rgmiller
Copy link
Author

rgmiller commented Nov 9, 2015

Just a quick note to say that I experienced another lockup over the weekend. (This is on 0.6.5.3 with spl_taskq_thread_dynamic set to 0.) I have the stack traces from dmesg, but they don't look too different from the ones I've already posted, so I'm not going to bother unless someone wants them.

@behlendorf
Copy link
Contributor

@rgmiller sorry about the slow reply. The good news is that thanks to the stacks you posted I'm pretty sure I've identified the deadlock you're hitting. If I'm correct you should be able to prevent it by setting the zfs_vdev_aggregation_limit module option to 16384 or lower while I work up a patch. This will disable aggregation for larger blocks which is where this deadlock can occur due to a zio buffer allocation.

@rgmiller
Copy link
Author

Great. I'll give it a try this evening. What should I do about the spl_taskq_thread_dynamic setting? It's currently at 0. I assume I should turn it back on again, yes?

behlendorf added a commit to behlendorf/zfs that referenced this issue Dec 16, 2015
This deadlock may manifest itself in slightly different ways but
at the core it is caused by a memory allocation blocking on file-
system reclaim in the zio pipeline.  This is normally impossible
because zio_execute() disables filesystem reclaim by setting
PF_FSTRANS on the thread.  However, kmem cache allocations may
still indirectly block on file system reclaim while holding the
critical vq->vq_lock as shown below.

To resolve this issue zio_buf_alloc_flags() is introduced which
allocation flags to be passed.  This can then be used in
vdev_queue_aggregate() with KM_NOSLEEP when allocating the
aggregate IO buffer.  Since aggregating the IO is purely a
performance optimization we want this to either success or fail
quickly.  Trying too hard to allocate this memory under the
vq->vq_lock can negatively impact performance and result in
this deadlock.

* z_wr_iss
zio_vdev_io_start
  vdev_queue_io -> Takes vq->vq_lock
    vdev_queue_io_to_issue
      vdev_queue_aggregate
        zio_buf_alloc -> Waiting on spl_kmem_cache process

* z_wr_int
zio_vdev_io_done
  vdev_queue_io_done
    mutex_lock -> Waiting on vq->vq_lock held by z_wr_iss

* txg_sync
spa_sync
  dsl_pool_sync
    zio_wait -> Waiting on zio being handled by z_wr_int

* spl_kmem_cache
spl_vmalloc
  ...
  evict
    ...
    zfs_inactive
      dmu_tx_wait
        txg_wait_open -> Waiting on txg_sync

Signed-off-by: Brian Behlendorf <[email protected]>
Issue openzfs#3808
Issue openzfs#3867
@behlendorf
Copy link
Contributor

@rgmiller you should be able to leave the default. For 0.6.5.3 this is spl_taskq_thread_dynamic=0, if your running master spl_taskq_thread_dynamic=1 but a patch has been applied which resolves the root cause correctly. Or if you could verify the patch in #4111 without modifying zfs_vdev_aggregation_limit that would be great.

behlendorf added a commit to behlendorf/zfs that referenced this issue Dec 17, 2015
This deadlock may manifest itself in slightly different ways but
at the core it is caused by a memory allocation blocking on file-
system reclaim in the zio pipeline.  This is normally impossible
because zio_execute() disables filesystem reclaim by setting
PF_FSTRANS on the thread.  However, kmem cache allocations may
still indirectly block on file system reclaim while holding the
critical vq->vq_lock as shown below.

To resolve this issue zio_buf_alloc_flags() is introduced which
allocation flags to be passed.  This can then be used in
vdev_queue_aggregate() with KM_NOSLEEP when allocating the
aggregate IO buffer.  Since aggregating the IO is purely a
performance optimization we want this to either succeed or fail
quickly.  Trying too hard to allocate this memory under the
vq->vq_lock can negatively impact performance and result in
this deadlock.

* z_wr_iss
zio_vdev_io_start
  vdev_queue_io -> Takes vq->vq_lock
    vdev_queue_io_to_issue
      vdev_queue_aggregate
        zio_buf_alloc -> Waiting on spl_kmem_cache process

* z_wr_int
zio_vdev_io_done
  vdev_queue_io_done
    mutex_lock -> Waiting on vq->vq_lock held by z_wr_iss

* txg_sync
spa_sync
  dsl_pool_sync
    zio_wait -> Waiting on zio being handled by z_wr_int

* spl_kmem_cache
spl_cache_grow_work
  kv_alloc
    spl_vmalloc
      ...
      evict
        zpl_evict_inode
          zfs_inactive
            dmu_tx_wait
              txg_wait_open -> Waiting on txg_sync

Signed-off-by: Brian Behlendorf <[email protected]>
Issue openzfs#3808
Issue openzfs#3867
@rgmiller
Copy link
Author

Just a quick update: I installed patch #4111 this evening and it's running now. The overnight backup jobs are what normally caused problems, so I'll leave another update after it's run for a couple more days.

@rgmiller
Copy link
Author

The overnight backup jobs have run 3 times now without problems, so it looks like the patch is solid.

@behlendorf
Copy link
Contributor

@rgmiller excellent news, thanks for following up.

behlendorf added a commit that referenced this issue Dec 24, 2015
This deadlock may manifest itself in slightly different ways but
at the core it is caused by a memory allocation blocking on file-
system reclaim in the zio pipeline.  This is normally impossible
because zio_execute() disables filesystem reclaim by setting
PF_FSTRANS on the thread.  However, kmem cache allocations may
still indirectly block on file system reclaim while holding the
critical vq->vq_lock as shown below.

To resolve this issue zio_buf_alloc_flags() is introduced which
allocation flags to be passed.  This can then be used in
vdev_queue_aggregate() with KM_NOSLEEP when allocating the
aggregate IO buffer.  Since aggregating the IO is purely a
performance optimization we want this to either succeed or fail
quickly.  Trying too hard to allocate this memory under the
vq->vq_lock can negatively impact performance and result in
this deadlock.

* z_wr_iss
zio_vdev_io_start
  vdev_queue_io -> Takes vq->vq_lock
    vdev_queue_io_to_issue
      vdev_queue_aggregate
        zio_buf_alloc -> Waiting on spl_kmem_cache process

* z_wr_int
zio_vdev_io_done
  vdev_queue_io_done
    mutex_lock -> Waiting on vq->vq_lock held by z_wr_iss

* txg_sync
spa_sync
  dsl_pool_sync
    zio_wait -> Waiting on zio being handled by z_wr_int

* spl_kmem_cache
spl_cache_grow_work
  kv_alloc
    spl_vmalloc
      ...
      evict
        zpl_evict_inode
          zfs_inactive
            dmu_tx_wait
              txg_wait_open -> Waiting on txg_sync

Signed-off-by: Brian Behlendorf <[email protected]>
Signed-off-by: Chunwei Chen <[email protected]>
Signed-off-by: Tim Chase <[email protected]>
Closes #3808
Closes #3867
@blballa
Copy link

blballa commented Jan 4, 2016

First, I want to thank everyone for their hard work. Code is awesome.
I can confirm that this bug exists with 6.5.3 whenever we put zfs under heavy load. like rsync and large file copy over samba. We went ahead and pull head from git and we can confirm, that we are no longer getting kernel panics, requiring hard reboot of the servers. It might make sense to release a 6.5.4 so that "official packages" can get build for ubuntu, since right now if you upgrade from 6.4 train which was stable, you will most likely get deadlocks. Let me know if you need any more information about our setup or logs.

@behlendorf
Copy link
Contributor

Thanks for the confirmation, an 0.6.5.4 is definitely in the works.

ryao pushed a commit to ryao/zfs that referenced this issue Jan 4, 2016
This deadlock may manifest itself in slightly different ways but
at the core it is caused by a memory allocation blocking on file-
system reclaim in the zio pipeline.  This is normally impossible
because zio_execute() disables filesystem reclaim by setting
PF_FSTRANS on the thread.  However, kmem cache allocations may
still indirectly block on file system reclaim while holding the
critical vq->vq_lock as shown below.

To resolve this issue zio_buf_alloc_flags() is introduced which
allocation flags to be passed.  This can then be used in
vdev_queue_aggregate() with KM_NOSLEEP when allocating the
aggregate IO buffer.  Since aggregating the IO is purely a
performance optimization we want this to either succeed or fail
quickly.  Trying too hard to allocate this memory under the
vq->vq_lock can negatively impact performance and result in
this deadlock.

* z_wr_iss
zio_vdev_io_start
  vdev_queue_io -> Takes vq->vq_lock
    vdev_queue_io_to_issue
      vdev_queue_aggregate
        zio_buf_alloc -> Waiting on spl_kmem_cache process

* z_wr_int
zio_vdev_io_done
  vdev_queue_io_done
    mutex_lock -> Waiting on vq->vq_lock held by z_wr_iss

* txg_sync
spa_sync
  dsl_pool_sync
    zio_wait -> Waiting on zio being handled by z_wr_int

* spl_kmem_cache
spl_cache_grow_work
  kv_alloc
    spl_vmalloc
      ...
      evict
        zpl_evict_inode
          zfs_inactive
            dmu_tx_wait
              txg_wait_open -> Waiting on txg_sync

Signed-off-by: Brian Behlendorf <[email protected]>
Signed-off-by: Chunwei Chen <[email protected]>
Signed-off-by: Tim Chase <[email protected]>
Closes openzfs#3808
Closes openzfs#3867
goulvenriou pushed a commit to Alyseo/zfs that referenced this issue Jan 17, 2016
This deadlock may manifest itself in slightly different ways but
at the core it is caused by a memory allocation blocking on file-
system reclaim in the zio pipeline.  This is normally impossible
because zio_execute() disables filesystem reclaim by setting
PF_FSTRANS on the thread.  However, kmem cache allocations may
still indirectly block on file system reclaim while holding the
critical vq->vq_lock as shown below.

To resolve this issue zio_buf_alloc_flags() is introduced which
allocation flags to be passed.  This can then be used in
vdev_queue_aggregate() with KM_NOSLEEP when allocating the
aggregate IO buffer.  Since aggregating the IO is purely a
performance optimization we want this to either succeed or fail
quickly.  Trying too hard to allocate this memory under the
vq->vq_lock can negatively impact performance and result in
this deadlock.

* z_wr_iss
zio_vdev_io_start
  vdev_queue_io -> Takes vq->vq_lock
    vdev_queue_io_to_issue
      vdev_queue_aggregate
        zio_buf_alloc -> Waiting on spl_kmem_cache process

* z_wr_int
zio_vdev_io_done
  vdev_queue_io_done
    mutex_lock -> Waiting on vq->vq_lock held by z_wr_iss

* txg_sync
spa_sync
  dsl_pool_sync
    zio_wait -> Waiting on zio being handled by z_wr_int

* spl_kmem_cache
spl_cache_grow_work
  kv_alloc
    spl_vmalloc
      ...
      evict
        zpl_evict_inode
          zfs_inactive
            dmu_tx_wait
              txg_wait_open -> Waiting on txg_sync

Signed-off-by: Brian Behlendorf <[email protected]>
Signed-off-by: Chunwei Chen <[email protected]>
Signed-off-by: Tim Chase <[email protected]>
Closes openzfs#3808
Closes openzfs#3867
goulvenriou pushed a commit to Alyseo/zfs that referenced this issue Feb 3, 2016
This deadlock may manifest itself in slightly different ways but
at the core it is caused by a memory allocation blocking on file-
system reclaim in the zio pipeline.  This is normally impossible
because zio_execute() disables filesystem reclaim by setting
PF_FSTRANS on the thread.  However, kmem cache allocations may
still indirectly block on file system reclaim while holding the
critical vq->vq_lock as shown below.

To resolve this issue zio_buf_alloc_flags() is introduced which
allocation flags to be passed.  This can then be used in
vdev_queue_aggregate() with KM_NOSLEEP when allocating the
aggregate IO buffer.  Since aggregating the IO is purely a
performance optimization we want this to either succeed or fail
quickly.  Trying too hard to allocate this memory under the
vq->vq_lock can negatively impact performance and result in
this deadlock.

* z_wr_iss
zio_vdev_io_start
  vdev_queue_io -> Takes vq->vq_lock
    vdev_queue_io_to_issue
      vdev_queue_aggregate
        zio_buf_alloc -> Waiting on spl_kmem_cache process

* z_wr_int
zio_vdev_io_done
  vdev_queue_io_done
    mutex_lock -> Waiting on vq->vq_lock held by z_wr_iss

* txg_sync
spa_sync
  dsl_pool_sync
    zio_wait -> Waiting on zio being handled by z_wr_int

* spl_kmem_cache
spl_cache_grow_work
  kv_alloc
    spl_vmalloc
      ...
      evict
        zpl_evict_inode
          zfs_inactive
            dmu_tx_wait
              txg_wait_open -> Waiting on txg_sync

Signed-off-by: Brian Behlendorf <[email protected]>
Signed-off-by: Chunwei Chen <[email protected]>
Signed-off-by: Tim Chase <[email protected]>
Closes openzfs#3808
Closes openzfs#3867
goulvenriou pushed a commit to Alyseo/zfs that referenced this issue Feb 4, 2016
This deadlock may manifest itself in slightly different ways but
at the core it is caused by a memory allocation blocking on file-
system reclaim in the zio pipeline.  This is normally impossible
because zio_execute() disables filesystem reclaim by setting
PF_FSTRANS on the thread.  However, kmem cache allocations may
still indirectly block on file system reclaim while holding the
critical vq->vq_lock as shown below.

To resolve this issue zio_buf_alloc_flags() is introduced which
allocation flags to be passed.  This can then be used in
vdev_queue_aggregate() with KM_NOSLEEP when allocating the
aggregate IO buffer.  Since aggregating the IO is purely a
performance optimization we want this to either succeed or fail
quickly.  Trying too hard to allocate this memory under the
vq->vq_lock can negatively impact performance and result in
this deadlock.

* z_wr_iss
zio_vdev_io_start
  vdev_queue_io -> Takes vq->vq_lock
    vdev_queue_io_to_issue
      vdev_queue_aggregate
        zio_buf_alloc -> Waiting on spl_kmem_cache process

* z_wr_int
zio_vdev_io_done
  vdev_queue_io_done
    mutex_lock -> Waiting on vq->vq_lock held by z_wr_iss

* txg_sync
spa_sync
  dsl_pool_sync
    zio_wait -> Waiting on zio being handled by z_wr_int

* spl_kmem_cache
spl_cache_grow_work
  kv_alloc
    spl_vmalloc
      ...
      evict
        zpl_evict_inode
          zfs_inactive
            dmu_tx_wait
              txg_wait_open -> Waiting on txg_sync

Signed-off-by: Brian Behlendorf <[email protected]>
Signed-off-by: Chunwei Chen <[email protected]>
Signed-off-by: Tim Chase <[email protected]>
Closes openzfs#3808
Closes openzfs#3867
goulvenriou pushed a commit to Alyseo/zfs that referenced this issue Feb 4, 2016
This deadlock may manifest itself in slightly different ways but
at the core it is caused by a memory allocation blocking on file-
system reclaim in the zio pipeline.  This is normally impossible
because zio_execute() disables filesystem reclaim by setting
PF_FSTRANS on the thread.  However, kmem cache allocations may
still indirectly block on file system reclaim while holding the
critical vq->vq_lock as shown below.

To resolve this issue zio_buf_alloc_flags() is introduced which
allocation flags to be passed.  This can then be used in
vdev_queue_aggregate() with KM_NOSLEEP when allocating the
aggregate IO buffer.  Since aggregating the IO is purely a
performance optimization we want this to either succeed or fail
quickly.  Trying too hard to allocate this memory under the
vq->vq_lock can negatively impact performance and result in
this deadlock.

* z_wr_iss
zio_vdev_io_start
  vdev_queue_io -> Takes vq->vq_lock
    vdev_queue_io_to_issue
      vdev_queue_aggregate
        zio_buf_alloc -> Waiting on spl_kmem_cache process

* z_wr_int
zio_vdev_io_done
  vdev_queue_io_done
    mutex_lock -> Waiting on vq->vq_lock held by z_wr_iss

* txg_sync
spa_sync
  dsl_pool_sync
    zio_wait -> Waiting on zio being handled by z_wr_int

* spl_kmem_cache
spl_cache_grow_work
  kv_alloc
    spl_vmalloc
      ...
      evict
        zpl_evict_inode
          zfs_inactive
            dmu_tx_wait
              txg_wait_open -> Waiting on txg_sync

Signed-off-by: Brian Behlendorf <[email protected]>
Signed-off-by: Chunwei Chen <[email protected]>
Signed-off-by: Tim Chase <[email protected]>
Closes openzfs#3808
Closes openzfs#3867
@behlendorf behlendorf modified the milestones: 0.6.5.4, 0.7.0 Mar 23, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants