ZFS_OBJ_HOLD_ENTER deadlock #1101

behlendorf · 2012-11-20T17:53:16Z

This appears to be caused by a deadlock on the ZFS_OBJ_HOLD_ENTER() mutex. I was able to recreate it in spl/zfs-0.6.0-rc12 by running 100 concurrent rsyncs of a directory containing 1/4 million files and 1 du.

mkdir /dev/shm/rsync/
fdtree -C -l3 -d10 -f100 -s1 -o /dev/shm/rsync

for (( i=0; i<100; i++ )); do
    rsync -a /dev/shm/rsync /tank/rsync${i} &
    du /tank >/dev/null &
done

wait

warp-1a login: INFO: task kswapd0:58 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kswapd0       D 0000000000000003     0    58      2 0x00000000
 ffff880227bcdaa0 0000000000000046 0000000000000000 ffffffff810618d4
 ffff8800a10ee080 00000000a10ee080 ffff880028395fc0 ffff880028395fc0
 ffff880227bcbab8 ffff880227bcdfd8 000000000000f4e8 ffff880227bcbab8
Call Trace:
 [<ffffffff810618d4>] ? enqueue_task_fair+0x64/0x100
 [<ffffffff814f07fe>] __mutex_lock_slowpath+0x13e/0x180
 [<ffffffffa03527db>] ? kmem_free_debug+0x4b/0x150 [spl]
 [<ffffffff814f069b>] mutex_lock+0x2b/0x50
 [<ffffffffa048bdfe>] zfs_zinactive+0x7e/0x110 [zfs]
 [<ffffffffa048ae1f>] zfs_inactive+0x7f/0x220 [zfs]
 [<ffffffffa049cd3e>] zpl_clear_inode+0xe/0x10 [zfs]
 [<ffffffff8119256c>] clear_inode+0xac/0x140
 [<ffffffff81192640>] dispose_list+0x40/0x120
 [<ffffffff81192994>] shrink_icache_memory+0x274/0x2e0
 [<ffffffff81129c0a>] shrink_slab+0x12a/0x1a0
 [<ffffffff8112c9bd>] balance_pgdat+0x57d/0x7e0
 [<ffffffff8112cfd0>] ? isolate_pages_global+0x0/0x350
 [<ffffffff81133846>] ? set_pgdat_percpu_threshold+0xa6/0xd0
 [<ffffffff8112cd56>] kswapd+0x136/0x3b0
 [<ffffffff81091090>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff8112cc20>] ? kswapd+0x0/0x3b0
 [<ffffffff81090d26>] kthread+0x96/0xa0
 [<ffffffff8100c14a>] child_rip+0xa/0x20
 [<ffffffff81090c90>] ? kthread+0x0/0xa0
 [<ffffffff8100c140>] ? child_rip+0x0/0x20
INFO: task arc_adapt:1695 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
arc_adapt     D 0000000000000003     0  1695      2 0x00000000
 ffff880216aefc40 0000000000000046 ffff8800ad7aab18 ffff880028215fc0
 ffff880216aefc00 ffffffff81053174 ffff880216aefbe0 ffffffff810618d4
 ffff880226be5058 ffff880216aeffd8 000000000000f4e8 ffff880226be5058
Call Trace:
 [<ffffffff81053174>] ? check_preempt_wakeup+0x1a4/0x260
 [<ffffffff810618d4>] ? enqueue_task_fair+0x64/0x100
 [<ffffffff814f068e>] ? mutex_lock+0x1e/0x50
 [<ffffffff814f07fe>] __mutex_lock_slowpath+0x13e/0x180
 [<ffffffffa03527db>] ? kmem_free_debug+0x4b/0x150 [spl]
 [<ffffffff814f069b>] mutex_lock+0x2b/0x50
 [<ffffffffa048bdfe>] zfs_zinactive+0x7e/0x110 [zfs]
 [<ffffffffa048ae1f>] zfs_inactive+0x7f/0x220 [zfs]
 [<ffffffffa049cd3e>] zpl_clear_inode+0xe/0x10 [zfs]
 [<ffffffff8119256c>] clear_inode+0xac/0x140
 [<ffffffff81192640>] dispose_list+0x40/0x120
 [<ffffffff81192994>] shrink_icache_memory+0x274/0x2e0
 [<ffffffffa049cec0>] ? zpl_prune_sbs+0x0/0x70 [zfs]
 [<ffffffffa049cf17>] zpl_prune_sbs+0x57/0x70 [zfs]
 [<ffffffffa03fb200>] arc_adjust_meta+0x120/0x1e0 [zfs]
 [<ffffffffa03fb2c0>] ? arc_adapt_thread+0x0/0xd0 [zfs]
 [<ffffffffa03fb2c0>] ? arc_adapt_thread+0x0/0xd0 [zfs]
 [<ffffffffa03fb32a>] arc_adapt_thread+0x6a/0xd0 [zfs]
 [<ffffffffa0356708>] thread_generic_wrapper+0x68/0x80 [spl]
 [<ffffffffa03566a0>] ? thread_generic_wrapper+0x0/0x80 [spl]
 [<ffffffff81090d26>] kthread+0x96/0xa0
 [<ffffffff8100c14a>] child_rip+0xa/0x20
 [<ffffffff81090c90>] ? kthread+0x0/0xa0
 [<ffffffff8100c140>] ? child_rip+0x0/0x20
INFO: task irqbalance:2104 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
irqbalance    D 0000000000000001     0  2104      1 0x00000000
 ffff880216b2d5e8 0000000000000082 0000000000000000 ffff880000028d80
 0000000000000008 ffff880000030dd8 ffff880216b2d5c8 ffffffff8112311a
 ffff880228ea9058 ffff880216b2dfd8 000000000000f4e8 ffff880228ea9058
Call Trace:
 [<ffffffff8112311a>] ? free_one_page+0x11a/0x350
 [<ffffffff814f07fe>] __mutex_lock_slowpath+0x13e/0x180
 [<ffffffffa03527db>] ? kmem_free_debug+0x4b/0x150 [spl]
 [<ffffffff814f069b>] mutex_lock+0x2b/0x50
 [<ffffffffa048bdfe>] zfs_zinactive+0x7e/0x110 [zfs]
 [<ffffffffa048ae1f>] zfs_inactive+0x7f/0x220 [zfs]
 [<ffffffffa049cd3e>] zpl_clear_inode+0xe/0x10 [zfs]
 [<ffffffff8119256c>] clear_inode+0xac/0x140
 [<ffffffff81192640>] dispose_list+0x40/0x120
 [<ffffffff81192994>] shrink_icache_memory+0x274/0x2e0
 [<ffffffff81129c0a>] shrink_slab+0x12a/0x1a0
 [<ffffffff8112ba6f>] do_try_to_free_pages+0x2ff/0x520
 [<ffffffff8112be7d>] try_to_free_pages+0x9d/0x130
 [<ffffffff8112cfd0>] ? isolate_pages_global+0x0/0x350
 [<ffffffff81123dad>] __alloc_pages_nodemask+0x40d/0x940
 [<ffffffff8120e200>] ? security_task_setrlimit+0x10/0x20
 [<ffffffff8115886a>] alloc_pages_vma+0x9a/0x150
 [<ffffffff8113c19b>] handle_pte_fault+0x76b/0xb50
 [<ffffffff81203133>] ? init_exit__mq_timedsend+0x3/0x70
 [<ffffffff8119237e>] ? inode_init_always+0x11e/0x1c0
 [<ffffffff81277306>] ? vsnprintf+0x2b6/0x5f0
 [<ffffffff8113c764>] handle_mm_fault+0x1e4/0x2b0
 [<ffffffff81042c29>] __do_page_fault+0x139/0x480
 [<ffffffff8119a198>] ? seq_printf+0x58/0x90
 [<ffffffff8119a823>] ? single_open+0x83/0x110
 [<ffffffff811e6061>] ? show_stat+0x5d1/0x5f0
 [<ffffffff814f4bde>] do_page_fault+0x3e/0xa0
 [<ffffffff814f1f95>] page_fault+0x25/0x30
 [<ffffffff812779ed>] ? copy_user_generic_string+0x2d/0x40
 [<ffffffff8119a65e>] ? seq_read+0x2ae/0x3f0
 [<ffffffff811dcafe>] proc_reg_read+0x7e/0xc0
 [<ffffffff811783d5>] vfs_read+0xb5/0x1a0
 [<ffffffff81178511>] sys_read+0x51/0x90
 [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b
INFO: task txg_quiesce:3210 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
txg_quiesce   D 0000000000000002     0  3210      2 0x00000000
 ffff88020159bd50 0000000000000046 ffff88020159bcf0 ffffffff8115f93c
 ffff88020159bd20 ffffffff000080d0 ffff88020eec85f8 ffff88020eec8040
 ffff88020eec85f8 ffff88020159bfd8 000000000000f4e8 ffff88020eec85f8
Call Trace:
 [<ffffffff8115f93c>] ? __kmalloc+0x20c/0x220
 [<ffffffff8109129e>] ? prepare_to_wait_exclusive+0x4e/0x80
 [<ffffffffa035da60>] cv_wait_common+0xa0/0x1a0 [spl]
 [<ffffffff81091090>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff8127b03c>] ? __bitmap_weight+0x8c/0xb0
 [<ffffffffa035db93>] __cv_wait+0x13/0x20 [spl]
 [<ffffffffa0451de3>] txg_quiesce_thread+0x243/0x3a0 [zfs]
 [<ffffffff81052139>] ? set_user_nice+0xc9/0x130
 [<ffffffffa0451ba0>] ? txg_quiesce_thread+0x0/0x3a0 [zfs]
 [<ffffffffa0356708>] thread_generic_wrapper+0x68/0x80 [spl]
 [<ffffffffa03566a0>] ? thread_generic_wrapper+0x0/0x80 [spl]
 [<ffffffff81090d26>] kthread+0x96/0xa0
 [<ffffffff8100c14a>] child_rip+0xa/0x20
 [<ffffffff81090c90>] ? kthread+0x0/0xa0
 [<ffffffff8100c140>] ? child_rip+0x0/0x20
INFO: task rsync:3425 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
rsync         D 0000000000000002     0  3425   3336 0x00000000
 ffff8800b6d574c8 0000000000000082 0000000000000000 ffff880028215fc0
 0000000000000002 0000000000000002 ffff8800b6d57468 ffffffff810618d4
 ffff8800b6e21098 ffff8800b6d57fd8 000000000000f4e8 ffff8800b6e21098
Call Trace:
 [<ffffffff810618d4>] ? enqueue_task_fair+0x64/0x100
 [<ffffffff814f07fe>] __mutex_lock_slowpath+0x13e/0x180
 [<ffffffffa03527db>] ? kmem_free_debug+0x4b/0x150 [spl]
 [<ffffffff814f069b>] mutex_lock+0x2b/0x50
 [<ffffffffa048bdfe>] zfs_zinactive+0x7e/0x110 [zfs]
 [<ffffffffa048ae1f>] zfs_inactive+0x7f/0x220 [zfs]
 [<ffffffffa049cd3e>] zpl_clear_inode+0xe/0x10 [zfs]
 [<ffffffff8119256c>] clear_inode+0xac/0x140
 [<ffffffff81192640>] dispose_list+0x40/0x120
 [<ffffffff81192994>] shrink_icache_memory+0x274/0x2e0
 [<ffffffff81129c0a>] shrink_slab+0x12a/0x1a0
 [<ffffffff8112ba6f>] do_try_to_free_pages+0x2ff/0x520
 [<ffffffff8112be7d>] try_to_free_pages+0x9d/0x130
 [<ffffffff8112cfd0>] ? isolate_pages_global+0x0/0x350
 [<ffffffff81123dad>] __alloc_pages_nodemask+0x40d/0x940
 [<ffffffff8115e2f2>] kmem_getpages+0x62/0x170
 [<ffffffff8115ef0a>] fallback_alloc+0x1ba/0x270
 [<ffffffff8115e95f>] ? cache_grow+0x2cf/0x320
 [<ffffffff8115ec89>] ____cache_alloc_node+0x99/0x160
 [<ffffffff81423e6a>] ? __alloc_skb+0x7a/0x180
 [<ffffffff8115fb4f>] kmem_cache_alloc_node_notrace+0x6f/0x130
 [<ffffffff8115fd8b>] __kmalloc_node+0x7b/0x100
 [<ffffffff81423e6a>] __alloc_skb+0x7a/0x180
 [<ffffffff8141fca0>] sock_alloc_send_pskb+0x1e0/0x350
 [<ffffffff814be501>] ? unix_stream_recvmsg+0x351/0x730
 [<ffffffff8141fe25>] sock_alloc_send_skb+0x15/0x20
 [<ffffffff814bed90>] unix_stream_sendmsg+0x2a0/0x420
 [<ffffffff8141e31e>] sock_aio_write+0x15e/0x170
 [<ffffffff8117789a>] do_sync_write+0xfa/0x140
 [<ffffffff81091090>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff8109bca9>] ? ktime_get_ts+0xa9/0xe0
 [<ffffffff8118c558>] ? poll_select_copy_remaining+0xf8/0x150
 [<ffffffff8120de26>] ? security_file_permission+0x16/0x20
 [<ffffffff81177c64>] vfs_write+0x184/0x1a0
 [<ffffffff811785a1>] sys_write+0x51/0x90
 [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b
INFO: task rsync:3444 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
rsync         D 0000000000000001     0  3444   3421 0x00000000
 ffff8800b66a98c8 0000000000000082 0000000000000030 0000000000000030
 ffff8800b6684aa0 0000000000000000 ffff8800b66a98a8 0000000000000286
 ffff8800b6685058 ffff8800b66a9fd8 000000000000f4e8 ffff8800b6685058
Call Trace:
 [<ffffffff8109137e>] ? prepare_to_wait+0x4e/0x80
 [<ffffffff81191ea8>] __wait_on_freeing_inode+0x98/0xc0
 [<ffffffff810910d0>] ? wake_bit_function+0x0/0x50
 [<ffffffffa03f7b74>] ? arc_buf_remove_ref+0xd4/0x120 [zfs]
 [<ffffffff81191f28>] find_inode_fast+0x58/0x80
 [<ffffffff8119308c>] ifind_fast+0x3c/0xb0
 [<ffffffff8119313b>] ilookup+0x3b/0x50
 [<ffffffffa048dc4a>] zfs_zget+0xaa/0x1e0 [zfs]
 [<ffffffffa046f64c>] zfs_dirent_lock+0x47c/0x540 [zfs]
 [<ffffffffa046f91b>] zfs_dirlook+0x8b/0x270 [zfs]
 [<ffffffffa035ef2f>] ? tsd_exit+0x5f/0x1c0 [spl]
 [<ffffffffa04880ce>] zfs_lookup+0x2fe/0x350 [zfs]
 [<ffffffffa049c367>] zpl_lookup+0x57/0xc0 [zfs]
 [<ffffffff8118604c>] do_lookup+0x16c/0x1e0
 [<ffffffff811863cd>] __link_path_walk+0x20d/0x1030
 [<ffffffff8118747a>] path_walk+0x6a/0xe0
 [<ffffffff8118764b>] do_path_lookup+0x5b/0xa0
 [<ffffffff811882b7>] user_path_at+0x57/0xa0
 [<ffffffff81091090>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff8109bca9>] ? ktime_get_ts+0xa9/0xe0
 [<ffffffff8117cea6>] vfs_fstatat+0x46/0x80
 [<ffffffff8117cf4e>] vfs_lstat+0x1e/0x20
 [<ffffffff8117cf74>] sys_newlstat+0x24/0x50
 [<ffffffff811785a1>] ? sys_write+0x51/0x90
 [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b
INFO: task rsync:3445 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
rsync         D 0000000000000001     0  3445   3420 0x00000000
 ffff8800b66cdbf8 0000000000000086 ffff8800b66cdb78 ffff880000000000
 ffff8800b66cdb78 0000000000000082 ffff880028295fc0 ffff88009cd95a38
 ffff8800b6615058 ffff8800b66cdfd8 000000000000f4e8 ffff8800b6615058
Call Trace:
 [<ffffffff8109129e>] ? prepare_to_wait_exclusive+0x4e/0x80
 [<ffffffffa035da60>] cv_wait_common+0xa0/0x1a0 [spl]
 [<ffffffff81091090>] ? autoremove_wake_function+0x0/0x40
 [<ffffffffa035db93>] __cv_wait+0x13/0x20 [spl]
 [<ffffffffa0450fcb>] txg_wait_open+0x8b/0x110 [zfs]
 [<ffffffffa041475d>] dmu_tx_wait+0xed/0xf0 [zfs]
 [<ffffffffa0486db3>] zfs_mkdir+0x1b3/0x590 [zfs]
 [<ffffffffa049ca4e>] zpl_mkdir+0x9e/0xe0 [zfs]
 [<ffffffff811839c3>] ? generic_permission+0x23/0xb0
 [<ffffffff81184c97>] vfs_mkdir+0xa7/0x100
 [<ffffffff81187e5e>] sys_mkdirat+0xfe/0x120
 [<ffffffff8117cf74>] ? sys_newlstat+0x24/0x50
 [<ffffffff81187e98>] sys_mkdir+0x18/0x20
 [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b
INFO: task rsync:3449 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
rsync         D 0000000000000001     0  3449   3368 0x00000000
 ffff8800b66f7928 0000000000000082 ffff8800b66f7980 000000000000000a
 ffff88015ee22a40 ffff8800ac81a9e0 ffff8800b66f78e8 0000000000000246
 ffff8800b66145f8 ffff8800b66f7fd8 000000000000f4e8 ffff8800b66145f8
Call Trace:
 [<ffffffff814f07fe>] __mutex_lock_slowpath+0x13e/0x180
 [<ffffffff814f069b>] mutex_lock+0x2b/0x50
 [<ffffffffa048dc55>] zfs_zget+0xb5/0x1e0 [zfs]
 [<ffffffff8118cefd>] ? do_select+0x5ed/0x6b0
 [<ffffffffa046f64c>] zfs_dirent_lock+0x47c/0x540 [zfs]
 [<ffffffffa046f91b>] zfs_dirlook+0x8b/0x270 [zfs]
 [<ffffffff81423e6a>] ? __alloc_skb+0x7a/0x180
 [<ffffffffa04880ce>] zfs_lookup+0x2fe/0x350 [zfs]
 [<ffffffffa049c367>] zpl_lookup+0x57/0xc0 [zfs]
 [<ffffffff8118604c>] do_lookup+0x16c/0x1e0
 [<ffffffff811868f4>] __link_path_walk+0x734/0x1030
 [<ffffffff8118747a>] path_walk+0x6a/0xe0
 [<ffffffff8118764b>] do_path_lookup+0x5b/0xa0
 [<ffffffff811882b7>] user_path_at+0x57/0xa0
 [<ffffffff81091090>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff8109bca9>] ? ktime_get_ts+0xa9/0xe0
 [<ffffffff8117cea6>] vfs_fstatat+0x46/0x80
 [<ffffffff8117cf4e>] vfs_lstat+0x1e/0x20
 [<ffffffff8117cf74>] sys_newlstat+0x24/0x50
 [<ffffffff811785a1>] ? sys_write+0x51/0x90
 [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b
INFO: task rsync:3479 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
rsync         D 0000000000000001     0  3479   3457 0x00000000
 ffff8800b5633b18 0000000000000082 ffff8800b5633a98 ffff880000000000
 ffff8800b5633a98 0000000000000086 ffff880028295fc0 ffff88009cd95a38
 ffff8800b55eb098 ffff8800b5633fd8 000000000000f4e8 ffff8800b55eb098
Call Trace:
 [<ffffffff8109129e>] ? prepare_to_wait_exclusive+0x4e/0x80
 [<ffffffffa035da60>] cv_wait_common+0xa0/0x1a0 [spl]
 [<ffffffff81091090>] ? autoremove_wake_function+0x0/0x40
 [<ffffffffa035db93>] __cv_wait+0x13/0x20 [spl]
 [<ffffffffa0450fcb>] txg_wait_open+0x8b/0x110 [zfs]
 [<ffffffffa041475d>] dmu_tx_wait+0xed/0xf0 [zfs]
 [<ffffffffa0487850>] zfs_create+0x290/0x6f0 [zfs]
 [<ffffffffa049cc07>] zpl_create+0xa7/0xe0 [zfs]
 [<ffffffff811839c3>] ? generic_permission+0x23/0xb0
 [<ffffffff811853a4>] vfs_create+0xb4/0xe0
 [<ffffffff81188faf>] do_filp_open+0xb2f/0xd60
 [<ffffffff81194c12>] ? alloc_fd+0x92/0x160
 [<ffffffff81175169>] do_sys_open+0x69/0x140
 [<ffffffff81175280>] sys_open+0x20/0x30
 [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b
INFO: task rsync:3509 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
rsync         D 0000000000000003     0  3509   3348 0x00000000
 ffff8800b49c9a88 0000000000000086 ffff8800b49c9a50 ffff8800b49c9a4c
 0000000000000020 ffff88022fc23680 ffff880028295fc0 0000000000000207
 ffff8800b49c7af8 ffff8800b49c9fd8 000000000000f4e8 ffff8800b49c7af8
Call Trace:
 [<ffffffffa035da60>] cv_wait_common+0xa0/0x1a0 [spl]
 [<ffffffff81091090>] ? autoremove_wake_function+0x0/0x40
 [<ffffffffa035db93>] __cv_wait+0x13/0x20 [spl]
 [<ffffffffa0450fcb>] txg_wait_open+0x8b/0x110 [zfs]
 [<ffffffffa041475d>] dmu_tx_wait+0xed/0xf0 [zfs]
 [<ffffffffa048560c>] zfs_setattr+0x14bc/0x17a0 [zfs]
 [<ffffffff8118e563>] ? d_rehash+0x13/0x20
 [<ffffffff81190e00>] ? d_splice_alias+0x10/0x110
 [<ffffffff8118fda7>] ? __d_lookup+0xa7/0x150
 [<ffffffff8119031a>] ? dput+0x9a/0x150
 [<ffffffff8115f93c>] ? __kmalloc+0x20c/0x220
 [<ffffffffa035509f>] ? kmem_alloc_debug+0x8f/0x4c0 [spl]
 [<ffffffffa049c5ac>] zpl_setattr+0xdc/0x110 [zfs]
 [<ffffffff81193cc8>] notify_change+0x168/0x340
 [<ffffffff811846e5>] ? putname+0x35/0x50
 [<ffffffff811a7a9c>] utimes_common+0xdc/0x1b0
 [<ffffffff811a7c4e>] do_utimes+0xde/0xf0
 [<ffffffff811a7c90>] sys_futimesat+0x30/0xb0
 [<ffffffff811a7d29>] sys_utimes+0x19/0x20
 [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b

The text was updated successfully, but these errors were encountered:

behlendorf · 2012-11-20T19:27:02Z

This is likely the cause of the lockup in #676.

DeHackEd · 2012-11-21T03:53:38Z

http://dehacked.2y.net/sizes-4h.png
http://dehacked.2y.net/arc-4h.png
http://dehacked.2y.net/evict-4h.png

Your workaround you gave in IRC (increase ZFS_OBJ_MTX_SZ in include/sys/zfs_vfs.h to 256) does wonders in regards to fixing issue 676. Normally this system would be hung in 30 minutes.

behlendorf · 2012-11-21T15:29:43Z

Excellent so that's pretty good confirmation this is the root cause of your deadlocks. Increasing the array size sounds like a reasonable short term workaround, but it just makes the issue less likely. It doesn't actually fix it. If you need to you could increase the value further until we have an proper fix.

behlendorf · 2012-11-26T21:49:29Z

Here's the deadlock, it's basically a lock inversion. But now that I have the exact stacks I can put together a patch to prevent it. In the meanwhile increasing ZFS_OBJ_MTX_SZ is a decent work around.

du
Call Trace:
 [<ffffffff814eff92>] schedule_timeout+0x192/0x2e0
 [<ffffffffa0656746>] spl_kmem_cache_alloc+0xad6/0x18c0 [spl]
 [<ffffffffa070824e>] dbuf_create+0x8e/0x950 [zfs]
 [<ffffffffa070be51>] __dbuf_hold_impl+0x381/0xb40 [zfs]
 [<ffffffffa070c696>] dbuf_hold_impl+0x86/0xc0 [zfs]
 [<ffffffffa070ef80>] dbuf_hold+0x20/0x30 [zfs]
 [<ffffffffa07375a9>] dnode_hold_impl+0x119/0x9d0 [zfs]
 [<ffffffffa0737e79>] dnode_hold+0x19/0x20 [zfs]
 [<ffffffffa0719be7>] dmu_bonus_hold+0x37/0x3e0 [zfs]
 [<ffffffffa076c5ae>] sa_buf_hold+0xe/0x10 [zfs]
 [<ffffffffa07de15d>] zfs_zget+0x7d/0x430 [zfs]
 [<ffffffffa07b839b>] zfs_dirent_lock+0x5ab/0x6f0 [zfs]
 [<ffffffffa07b86f3>] zfs_dirlook+0x93/0x2e0 [zfs]
 [<ffffffffa07d6e9e>] zfs_lookup+0x2fe/0x350 [zfs]
 [<ffffffffa07f7b7c>] zpl_lookup+0x5c/0x120 [zfs]
 [<ffffffff8118604c>] do_lookup+0x16c/0x1e0
 [<ffffffff811868f4>] __link_path_walk+0x734/0x1030
 [<ffffffff8118747a>] path_walk+0x6a/0xe0
 [<ffffffff8118764b>] do_path_lookup+0x5b/0xa0
 [<ffffffff811882b7>] user_path_at+0x57/0xa0
 [<ffffffff8117cea6>] vfs_fstatat+0x46/0x80
 [<ffffffff8117cf04>] sys_newfstatat+0x24/0x50
 [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b

events/0
Call Trace:
 [<ffffffff814f07fe>] __mutex_lock_slowpath+0x13e/0x180
 [<ffffffff814f069b>] mutex_lock+0x2b/0x50
 [<ffffffffa07e01ff>] zfs_zinactive+0xaf/0x260 [zfs]
 [<ffffffffa07daa3f>] zfs_inactive+0x7f/0x3b0 [zfs]
 [<ffffffffa07f8a7e>] zpl_clear_inode+0xe/0x10 [zfs]
 [<ffffffff8119256c>] clear_inode+0xac/0x140
 [<ffffffff81192640>] dispose_list+0x40/0x120
 [<ffffffff81192994>] shrink_icache_memory+0x274/0x2e0
 [<ffffffff81129c0a>] shrink_slab+0x12a/0x1a0
 [<ffffffff8112ba6f>] do_try_to_free_pages+0x2ff/0x520
 [<ffffffff8112be7d>] try_to_free_pages+0x9d/0x130
 [<ffffffff81123dad>] __alloc_pages_nodemask+0x40d/0x940
 [<ffffffff8115876a>] alloc_pages_current+0xaa/0x110
 [<ffffffff81121f4e>] __get_free_pages+0xe/0x50
 [<ffffffffa0654cc8>] kv_alloc+0x58/0x130 [spl]
 [<ffffffffa0654de0>] spl_cache_grow_work+0x40/0x4b0 [spl]

behlendorf · 2012-11-27T01:10:34Z

@DeHackEd Can you please apply the above patch to the spl. It should resolve the hard deadlock.

Allowing the spl_cache_grow_work() function to reclaim inodes allows for two unlikely deadlocks. Therefore, we clear __GFP_FS for these allocations. The two deadlocks are: * While holding the ZFS_OBJ_HOLD_ENTER(zsb, obj1) lock a function calls kmem_cache_alloc() which happens to need to allocate a new slab. To allocate the new slab we enter FS level reclaim and attempt to evict several inodes. To evict these inodes we need to take the ZFS_OBJ_HOLD_ENTER(zsb, obj2) lock and it just happens that obj1 and obj2 use the same hashed lock. * Similar to the first case however instead of getting blocked on the hash lock we block in txg_wait_open() which is waiting for the next txg which isn't coming because the txg_sync thread is blocked in kmem_cache_alloc(). Note this isn't a 100% fix because vmalloc() won't strictly honor __GFP_FS. However, it practice this is sufficient because several very unlikely things must all occur concurrently. Signed-off-by: Brian Behlendorf <[email protected]> Issue openzfs/zfs#1101

DeHackEd · 2012-11-29T11:59:39Z

Running off the latest GIT masters for both SPL and ZFS I hung the system like so:

http://pastebin.com/tuH8Y9kE

# free
             total       used       free     shared    buffers     cached
Mem:      16448180   16109116     339064          0       1520      16212
-/+ buffers/cache:   16091384     356796
Swap:      2097148     157880    1939268

Interestingly it's not hard deadlocked and I can still work around the areas of the disk which are not ZFS filesystems (with choppy performance). After doing killall -9 rsync it unjammed itself.

behlendorf · 2012-11-29T16:43:06Z

@DeHackEd Did you have similar issues with 6c5207088f732168569d1a0b29f5f949b91bb503 applied?

Due to the slightly increased size of the ZFS super block caused by 30315d2 there are now allocation warnings. The allocation size is still small (just over 8k) and super blocks are rarely allocated so we suppress the warning. Signed-off-by: Brian Behlendorf <[email protected]> Issue #1101

DeHackEd · 2012-12-01T12:43:55Z

Combining 6c5207088f732168569d1a0b29f5f949b91bb503 with the other patches (SPL and ZFS_OBJ_HOLD_ENTER -> 256) produced a system that under similar load may still exceed its arc_meta_limit but didn't crash. I think stacking all these patches together is outright necessary on my workload. :/

behlendorf · 2012-12-03T17:58:57Z

@DeHackEd Just to be clear with all of these patches applied your system is stable? (aside from exceeding the limit). If so I'll revisit 6c5207088f732168569d1a0b29f5f949b91bb503 to see if I can be reworked in to something which can be merged. I'd hoped to defer further changes in there area until 0.7.0 but perhaps something minimal can be done.

Increasing this limit costs us 6144 bytes of memory per mounted filesystem, but this is small price to pay for accomplishing the following: * Allows for up to 256-way concurreny when performing lookups which helps performance when there are a large number of processes. * Minimizes the likelyhood of encountering the deadlock described in issue openzfs#1101. Because vmalloc() won't strictly honor __GFP_FS there is still a very remote chance of a deadlock. See the openzfs/spl@043f9b57 commit. Signed-off-by: Brian Behlendorf <[email protected]> Closes openzfs#1101

Due to the slightly increased size of the ZFS super block caused by 30315d2 there are now allocation warnings. The allocation size is still small (just over 8k) and super blocks are rarely allocated so we suppress the warning. Signed-off-by: Brian Behlendorf <[email protected]> Issue openzfs#1101

) Bumps [tempfile](https://github.com/Stebalien/tempfile) from 3.7.0 to 3.7.1. - [Changelog](https://github.com/Stebalien/tempfile/blob/master/CHANGELOG.md) - [Commits](Stebalien/tempfile@v3.7.0...v3.7.1) --- updated-dependencies: - dependency-name: tempfile dependency-type: indirect update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

DeHackEd mentioned this issue Nov 21, 2012

ARC grows well past the zfs_arc_max #676

Closed

behlendorf closed this as completed in 30315d2 Nov 28, 2012

behlendorf mentioned this issue Nov 28, 2012

issue on SELinux using xattrs #671

Closed

behlendorf mentioned this issue Dec 5, 2012

Immortal processes & ZoL #1070

Closed

localoptimum mentioned this issue Aug 9, 2013

txg_sync hangs zpool or zfs commands *without* maxing out CPU #1640

Closed

kernelOfTruth mentioned this issue Mar 14, 2015

More aggressively maintain arc_meta_limit (WIP) #3181

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ZFS_OBJ_HOLD_ENTER deadlock #1101

ZFS_OBJ_HOLD_ENTER deadlock #1101

behlendorf commented Nov 20, 2012

behlendorf commented Nov 20, 2012

DeHackEd commented Nov 21, 2012

behlendorf commented Nov 21, 2012

behlendorf commented Nov 26, 2012

behlendorf commented Nov 27, 2012

DeHackEd commented Nov 29, 2012

behlendorf commented Nov 29, 2012

DeHackEd commented Dec 1, 2012

behlendorf commented Dec 3, 2012

ZFS_OBJ_HOLD_ENTER deadlock #1101

ZFS_OBJ_HOLD_ENTER deadlock #1101

Comments

behlendorf commented Nov 20, 2012

behlendorf commented Nov 20, 2012

DeHackEd commented Nov 21, 2012

behlendorf commented Nov 21, 2012

behlendorf commented Nov 26, 2012

behlendorf commented Nov 27, 2012

DeHackEd commented Nov 29, 2012

behlendorf commented Nov 29, 2012

DeHackEd commented Dec 1, 2012

behlendorf commented Dec 3, 2012