Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZFS_OBJ_HOLD_ENTER deadlock #1101

Closed
behlendorf opened this issue Nov 20, 2012 · 9 comments
Closed

ZFS_OBJ_HOLD_ENTER deadlock #1101

behlendorf opened this issue Nov 20, 2012 · 9 comments
Milestone

Comments

@behlendorf
Copy link
Contributor

This appears to be caused by a deadlock on the ZFS_OBJ_HOLD_ENTER() mutex. I was able to recreate it in spl/zfs-0.6.0-rc12 by running 100 concurrent rsyncs of a directory containing 1/4 million files and 1 du.

mkdir /dev/shm/rsync/
fdtree -C -l3 -d10 -f100 -s1 -o /dev/shm/rsync

for (( i=0; i<100; i++ )); do
    rsync -a /dev/shm/rsync /tank/rsync${i} &
    du /tank >/dev/null &
done

wait
warp-1a login: INFO: task kswapd0:58 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kswapd0       D 0000000000000003     0    58      2 0x00000000
 ffff880227bcdaa0 0000000000000046 0000000000000000 ffffffff810618d4
 ffff8800a10ee080 00000000a10ee080 ffff880028395fc0 ffff880028395fc0
 ffff880227bcbab8 ffff880227bcdfd8 000000000000f4e8 ffff880227bcbab8
Call Trace:
 [<ffffffff810618d4>] ? enqueue_task_fair+0x64/0x100
 [<ffffffff814f07fe>] __mutex_lock_slowpath+0x13e/0x180
 [<ffffffffa03527db>] ? kmem_free_debug+0x4b/0x150 [spl]
 [<ffffffff814f069b>] mutex_lock+0x2b/0x50
 [<ffffffffa048bdfe>] zfs_zinactive+0x7e/0x110 [zfs]
 [<ffffffffa048ae1f>] zfs_inactive+0x7f/0x220 [zfs]
 [<ffffffffa049cd3e>] zpl_clear_inode+0xe/0x10 [zfs]
 [<ffffffff8119256c>] clear_inode+0xac/0x140
 [<ffffffff81192640>] dispose_list+0x40/0x120
 [<ffffffff81192994>] shrink_icache_memory+0x274/0x2e0
 [<ffffffff81129c0a>] shrink_slab+0x12a/0x1a0
 [<ffffffff8112c9bd>] balance_pgdat+0x57d/0x7e0
 [<ffffffff8112cfd0>] ? isolate_pages_global+0x0/0x350
 [<ffffffff81133846>] ? set_pgdat_percpu_threshold+0xa6/0xd0
 [<ffffffff8112cd56>] kswapd+0x136/0x3b0
 [<ffffffff81091090>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff8112cc20>] ? kswapd+0x0/0x3b0
 [<ffffffff81090d26>] kthread+0x96/0xa0
 [<ffffffff8100c14a>] child_rip+0xa/0x20
 [<ffffffff81090c90>] ? kthread+0x0/0xa0
 [<ffffffff8100c140>] ? child_rip+0x0/0x20
INFO: task arc_adapt:1695 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
arc_adapt     D 0000000000000003     0  1695      2 0x00000000
 ffff880216aefc40 0000000000000046 ffff8800ad7aab18 ffff880028215fc0
 ffff880216aefc00 ffffffff81053174 ffff880216aefbe0 ffffffff810618d4
 ffff880226be5058 ffff880216aeffd8 000000000000f4e8 ffff880226be5058
Call Trace:
 [<ffffffff81053174>] ? check_preempt_wakeup+0x1a4/0x260
 [<ffffffff810618d4>] ? enqueue_task_fair+0x64/0x100
 [<ffffffff814f068e>] ? mutex_lock+0x1e/0x50
 [<ffffffff814f07fe>] __mutex_lock_slowpath+0x13e/0x180
 [<ffffffffa03527db>] ? kmem_free_debug+0x4b/0x150 [spl]
 [<ffffffff814f069b>] mutex_lock+0x2b/0x50
 [<ffffffffa048bdfe>] zfs_zinactive+0x7e/0x110 [zfs]
 [<ffffffffa048ae1f>] zfs_inactive+0x7f/0x220 [zfs]
 [<ffffffffa049cd3e>] zpl_clear_inode+0xe/0x10 [zfs]
 [<ffffffff8119256c>] clear_inode+0xac/0x140
 [<ffffffff81192640>] dispose_list+0x40/0x120
 [<ffffffff81192994>] shrink_icache_memory+0x274/0x2e0
 [<ffffffffa049cec0>] ? zpl_prune_sbs+0x0/0x70 [zfs]
 [<ffffffffa049cf17>] zpl_prune_sbs+0x57/0x70 [zfs]
 [<ffffffffa03fb200>] arc_adjust_meta+0x120/0x1e0 [zfs]
 [<ffffffffa03fb2c0>] ? arc_adapt_thread+0x0/0xd0 [zfs]
 [<ffffffffa03fb2c0>] ? arc_adapt_thread+0x0/0xd0 [zfs]
 [<ffffffffa03fb32a>] arc_adapt_thread+0x6a/0xd0 [zfs]
 [<ffffffffa0356708>] thread_generic_wrapper+0x68/0x80 [spl]
 [<ffffffffa03566a0>] ? thread_generic_wrapper+0x0/0x80 [spl]
 [<ffffffff81090d26>] kthread+0x96/0xa0
 [<ffffffff8100c14a>] child_rip+0xa/0x20
 [<ffffffff81090c90>] ? kthread+0x0/0xa0
 [<ffffffff8100c140>] ? child_rip+0x0/0x20
INFO: task irqbalance:2104 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
irqbalance    D 0000000000000001     0  2104      1 0x00000000
 ffff880216b2d5e8 0000000000000082 0000000000000000 ffff880000028d80
 0000000000000008 ffff880000030dd8 ffff880216b2d5c8 ffffffff8112311a
 ffff880228ea9058 ffff880216b2dfd8 000000000000f4e8 ffff880228ea9058
Call Trace:
 [<ffffffff8112311a>] ? free_one_page+0x11a/0x350
 [<ffffffff814f07fe>] __mutex_lock_slowpath+0x13e/0x180
 [<ffffffffa03527db>] ? kmem_free_debug+0x4b/0x150 [spl]
 [<ffffffff814f069b>] mutex_lock+0x2b/0x50
 [<ffffffffa048bdfe>] zfs_zinactive+0x7e/0x110 [zfs]
 [<ffffffffa048ae1f>] zfs_inactive+0x7f/0x220 [zfs]
 [<ffffffffa049cd3e>] zpl_clear_inode+0xe/0x10 [zfs]
 [<ffffffff8119256c>] clear_inode+0xac/0x140
 [<ffffffff81192640>] dispose_list+0x40/0x120
 [<ffffffff81192994>] shrink_icache_memory+0x274/0x2e0
 [<ffffffff81129c0a>] shrink_slab+0x12a/0x1a0
 [<ffffffff8112ba6f>] do_try_to_free_pages+0x2ff/0x520
 [<ffffffff8112be7d>] try_to_free_pages+0x9d/0x130
 [<ffffffff8112cfd0>] ? isolate_pages_global+0x0/0x350
 [<ffffffff81123dad>] __alloc_pages_nodemask+0x40d/0x940
 [<ffffffff8120e200>] ? security_task_setrlimit+0x10/0x20
 [<ffffffff8115886a>] alloc_pages_vma+0x9a/0x150
 [<ffffffff8113c19b>] handle_pte_fault+0x76b/0xb50
 [<ffffffff81203133>] ? init_exit__mq_timedsend+0x3/0x70
 [<ffffffff8119237e>] ? inode_init_always+0x11e/0x1c0
 [<ffffffff81277306>] ? vsnprintf+0x2b6/0x5f0
 [<ffffffff8113c764>] handle_mm_fault+0x1e4/0x2b0
 [<ffffffff81042c29>] __do_page_fault+0x139/0x480
 [<ffffffff8119a198>] ? seq_printf+0x58/0x90
 [<ffffffff8119a823>] ? single_open+0x83/0x110
 [<ffffffff811e6061>] ? show_stat+0x5d1/0x5f0
 [<ffffffff814f4bde>] do_page_fault+0x3e/0xa0
 [<ffffffff814f1f95>] page_fault+0x25/0x30
 [<ffffffff812779ed>] ? copy_user_generic_string+0x2d/0x40
 [<ffffffff8119a65e>] ? seq_read+0x2ae/0x3f0
 [<ffffffff811dcafe>] proc_reg_read+0x7e/0xc0
 [<ffffffff811783d5>] vfs_read+0xb5/0x1a0
 [<ffffffff81178511>] sys_read+0x51/0x90
 [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b
INFO: task txg_quiesce:3210 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
txg_quiesce   D 0000000000000002     0  3210      2 0x00000000
 ffff88020159bd50 0000000000000046 ffff88020159bcf0 ffffffff8115f93c
 ffff88020159bd20 ffffffff000080d0 ffff88020eec85f8 ffff88020eec8040
 ffff88020eec85f8 ffff88020159bfd8 000000000000f4e8 ffff88020eec85f8
Call Trace:
 [<ffffffff8115f93c>] ? __kmalloc+0x20c/0x220
 [<ffffffff8109129e>] ? prepare_to_wait_exclusive+0x4e/0x80
 [<ffffffffa035da60>] cv_wait_common+0xa0/0x1a0 [spl]
 [<ffffffff81091090>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff8127b03c>] ? __bitmap_weight+0x8c/0xb0
 [<ffffffffa035db93>] __cv_wait+0x13/0x20 [spl]
 [<ffffffffa0451de3>] txg_quiesce_thread+0x243/0x3a0 [zfs]
 [<ffffffff81052139>] ? set_user_nice+0xc9/0x130
 [<ffffffffa0451ba0>] ? txg_quiesce_thread+0x0/0x3a0 [zfs]
 [<ffffffffa0356708>] thread_generic_wrapper+0x68/0x80 [spl]
 [<ffffffffa03566a0>] ? thread_generic_wrapper+0x0/0x80 [spl]
 [<ffffffff81090d26>] kthread+0x96/0xa0
 [<ffffffff8100c14a>] child_rip+0xa/0x20
 [<ffffffff81090c90>] ? kthread+0x0/0xa0
 [<ffffffff8100c140>] ? child_rip+0x0/0x20
INFO: task rsync:3425 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
rsync         D 0000000000000002     0  3425   3336 0x00000000
 ffff8800b6d574c8 0000000000000082 0000000000000000 ffff880028215fc0
 0000000000000002 0000000000000002 ffff8800b6d57468 ffffffff810618d4
 ffff8800b6e21098 ffff8800b6d57fd8 000000000000f4e8 ffff8800b6e21098
Call Trace:
 [<ffffffff810618d4>] ? enqueue_task_fair+0x64/0x100
 [<ffffffff814f07fe>] __mutex_lock_slowpath+0x13e/0x180
 [<ffffffffa03527db>] ? kmem_free_debug+0x4b/0x150 [spl]
 [<ffffffff814f069b>] mutex_lock+0x2b/0x50
 [<ffffffffa048bdfe>] zfs_zinactive+0x7e/0x110 [zfs]
 [<ffffffffa048ae1f>] zfs_inactive+0x7f/0x220 [zfs]
 [<ffffffffa049cd3e>] zpl_clear_inode+0xe/0x10 [zfs]
 [<ffffffff8119256c>] clear_inode+0xac/0x140
 [<ffffffff81192640>] dispose_list+0x40/0x120
 [<ffffffff81192994>] shrink_icache_memory+0x274/0x2e0
 [<ffffffff81129c0a>] shrink_slab+0x12a/0x1a0
 [<ffffffff8112ba6f>] do_try_to_free_pages+0x2ff/0x520
 [<ffffffff8112be7d>] try_to_free_pages+0x9d/0x130
 [<ffffffff8112cfd0>] ? isolate_pages_global+0x0/0x350
 [<ffffffff81123dad>] __alloc_pages_nodemask+0x40d/0x940
 [<ffffffff8115e2f2>] kmem_getpages+0x62/0x170
 [<ffffffff8115ef0a>] fallback_alloc+0x1ba/0x270
 [<ffffffff8115e95f>] ? cache_grow+0x2cf/0x320
 [<ffffffff8115ec89>] ____cache_alloc_node+0x99/0x160
 [<ffffffff81423e6a>] ? __alloc_skb+0x7a/0x180
 [<ffffffff8115fb4f>] kmem_cache_alloc_node_notrace+0x6f/0x130
 [<ffffffff8115fd8b>] __kmalloc_node+0x7b/0x100
 [<ffffffff81423e6a>] __alloc_skb+0x7a/0x180
 [<ffffffff8141fca0>] sock_alloc_send_pskb+0x1e0/0x350
 [<ffffffff814be501>] ? unix_stream_recvmsg+0x351/0x730
 [<ffffffff8141fe25>] sock_alloc_send_skb+0x15/0x20
 [<ffffffff814bed90>] unix_stream_sendmsg+0x2a0/0x420
 [<ffffffff8141e31e>] sock_aio_write+0x15e/0x170
 [<ffffffff8117789a>] do_sync_write+0xfa/0x140
 [<ffffffff81091090>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff8109bca9>] ? ktime_get_ts+0xa9/0xe0
 [<ffffffff8118c558>] ? poll_select_copy_remaining+0xf8/0x150
 [<ffffffff8120de26>] ? security_file_permission+0x16/0x20
 [<ffffffff81177c64>] vfs_write+0x184/0x1a0
 [<ffffffff811785a1>] sys_write+0x51/0x90
 [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b
INFO: task rsync:3444 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
rsync         D 0000000000000001     0  3444   3421 0x00000000
 ffff8800b66a98c8 0000000000000082 0000000000000030 0000000000000030
 ffff8800b6684aa0 0000000000000000 ffff8800b66a98a8 0000000000000286
 ffff8800b6685058 ffff8800b66a9fd8 000000000000f4e8 ffff8800b6685058
Call Trace:
 [<ffffffff8109137e>] ? prepare_to_wait+0x4e/0x80
 [<ffffffff81191ea8>] __wait_on_freeing_inode+0x98/0xc0
 [<ffffffff810910d0>] ? wake_bit_function+0x0/0x50
 [<ffffffffa03f7b74>] ? arc_buf_remove_ref+0xd4/0x120 [zfs]
 [<ffffffff81191f28>] find_inode_fast+0x58/0x80
 [<ffffffff8119308c>] ifind_fast+0x3c/0xb0
 [<ffffffff8119313b>] ilookup+0x3b/0x50
 [<ffffffffa048dc4a>] zfs_zget+0xaa/0x1e0 [zfs]
 [<ffffffffa046f64c>] zfs_dirent_lock+0x47c/0x540 [zfs]
 [<ffffffffa046f91b>] zfs_dirlook+0x8b/0x270 [zfs]
 [<ffffffffa035ef2f>] ? tsd_exit+0x5f/0x1c0 [spl]
 [<ffffffffa04880ce>] zfs_lookup+0x2fe/0x350 [zfs]
 [<ffffffffa049c367>] zpl_lookup+0x57/0xc0 [zfs]
 [<ffffffff8118604c>] do_lookup+0x16c/0x1e0
 [<ffffffff811863cd>] __link_path_walk+0x20d/0x1030
 [<ffffffff8118747a>] path_walk+0x6a/0xe0
 [<ffffffff8118764b>] do_path_lookup+0x5b/0xa0
 [<ffffffff811882b7>] user_path_at+0x57/0xa0
 [<ffffffff81091090>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff8109bca9>] ? ktime_get_ts+0xa9/0xe0
 [<ffffffff8117cea6>] vfs_fstatat+0x46/0x80
 [<ffffffff8117cf4e>] vfs_lstat+0x1e/0x20
 [<ffffffff8117cf74>] sys_newlstat+0x24/0x50
 [<ffffffff811785a1>] ? sys_write+0x51/0x90
 [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b
INFO: task rsync:3445 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
rsync         D 0000000000000001     0  3445   3420 0x00000000
 ffff8800b66cdbf8 0000000000000086 ffff8800b66cdb78 ffff880000000000
 ffff8800b66cdb78 0000000000000082 ffff880028295fc0 ffff88009cd95a38
 ffff8800b6615058 ffff8800b66cdfd8 000000000000f4e8 ffff8800b6615058
Call Trace:
 [<ffffffff8109129e>] ? prepare_to_wait_exclusive+0x4e/0x80
 [<ffffffffa035da60>] cv_wait_common+0xa0/0x1a0 [spl]
 [<ffffffff81091090>] ? autoremove_wake_function+0x0/0x40
 [<ffffffffa035db93>] __cv_wait+0x13/0x20 [spl]
 [<ffffffffa0450fcb>] txg_wait_open+0x8b/0x110 [zfs]
 [<ffffffffa041475d>] dmu_tx_wait+0xed/0xf0 [zfs]
 [<ffffffffa0486db3>] zfs_mkdir+0x1b3/0x590 [zfs]
 [<ffffffffa049ca4e>] zpl_mkdir+0x9e/0xe0 [zfs]
 [<ffffffff811839c3>] ? generic_permission+0x23/0xb0
 [<ffffffff81184c97>] vfs_mkdir+0xa7/0x100
 [<ffffffff81187e5e>] sys_mkdirat+0xfe/0x120
 [<ffffffff8117cf74>] ? sys_newlstat+0x24/0x50
 [<ffffffff81187e98>] sys_mkdir+0x18/0x20
 [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b
INFO: task rsync:3449 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
rsync         D 0000000000000001     0  3449   3368 0x00000000
 ffff8800b66f7928 0000000000000082 ffff8800b66f7980 000000000000000a
 ffff88015ee22a40 ffff8800ac81a9e0 ffff8800b66f78e8 0000000000000246
 ffff8800b66145f8 ffff8800b66f7fd8 000000000000f4e8 ffff8800b66145f8
Call Trace:
 [<ffffffff814f07fe>] __mutex_lock_slowpath+0x13e/0x180
 [<ffffffff814f069b>] mutex_lock+0x2b/0x50
 [<ffffffffa048dc55>] zfs_zget+0xb5/0x1e0 [zfs]
 [<ffffffff8118cefd>] ? do_select+0x5ed/0x6b0
 [<ffffffffa046f64c>] zfs_dirent_lock+0x47c/0x540 [zfs]
 [<ffffffffa046f91b>] zfs_dirlook+0x8b/0x270 [zfs]
 [<ffffffff81423e6a>] ? __alloc_skb+0x7a/0x180
 [<ffffffffa04880ce>] zfs_lookup+0x2fe/0x350 [zfs]
 [<ffffffffa049c367>] zpl_lookup+0x57/0xc0 [zfs]
 [<ffffffff8118604c>] do_lookup+0x16c/0x1e0
 [<ffffffff811868f4>] __link_path_walk+0x734/0x1030
 [<ffffffff8118747a>] path_walk+0x6a/0xe0
 [<ffffffff8118764b>] do_path_lookup+0x5b/0xa0
 [<ffffffff811882b7>] user_path_at+0x57/0xa0
 [<ffffffff81091090>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff8109bca9>] ? ktime_get_ts+0xa9/0xe0
 [<ffffffff8117cea6>] vfs_fstatat+0x46/0x80
 [<ffffffff8117cf4e>] vfs_lstat+0x1e/0x20
 [<ffffffff8117cf74>] sys_newlstat+0x24/0x50
 [<ffffffff811785a1>] ? sys_write+0x51/0x90
 [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b
INFO: task rsync:3479 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
rsync         D 0000000000000001     0  3479   3457 0x00000000
 ffff8800b5633b18 0000000000000082 ffff8800b5633a98 ffff880000000000
 ffff8800b5633a98 0000000000000086 ffff880028295fc0 ffff88009cd95a38
 ffff8800b55eb098 ffff8800b5633fd8 000000000000f4e8 ffff8800b55eb098
Call Trace:
 [<ffffffff8109129e>] ? prepare_to_wait_exclusive+0x4e/0x80
 [<ffffffffa035da60>] cv_wait_common+0xa0/0x1a0 [spl]
 [<ffffffff81091090>] ? autoremove_wake_function+0x0/0x40
 [<ffffffffa035db93>] __cv_wait+0x13/0x20 [spl]
 [<ffffffffa0450fcb>] txg_wait_open+0x8b/0x110 [zfs]
 [<ffffffffa041475d>] dmu_tx_wait+0xed/0xf0 [zfs]
 [<ffffffffa0487850>] zfs_create+0x290/0x6f0 [zfs]
 [<ffffffffa049cc07>] zpl_create+0xa7/0xe0 [zfs]
 [<ffffffff811839c3>] ? generic_permission+0x23/0xb0
 [<ffffffff811853a4>] vfs_create+0xb4/0xe0
 [<ffffffff81188faf>] do_filp_open+0xb2f/0xd60
 [<ffffffff81194c12>] ? alloc_fd+0x92/0x160
 [<ffffffff81175169>] do_sys_open+0x69/0x140
 [<ffffffff81175280>] sys_open+0x20/0x30
 [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b
INFO: task rsync:3509 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
rsync         D 0000000000000003     0  3509   3348 0x00000000
 ffff8800b49c9a88 0000000000000086 ffff8800b49c9a50 ffff8800b49c9a4c
 0000000000000020 ffff88022fc23680 ffff880028295fc0 0000000000000207
 ffff8800b49c7af8 ffff8800b49c9fd8 000000000000f4e8 ffff8800b49c7af8
Call Trace:
 [<ffffffffa035da60>] cv_wait_common+0xa0/0x1a0 [spl]
 [<ffffffff81091090>] ? autoremove_wake_function+0x0/0x40
 [<ffffffffa035db93>] __cv_wait+0x13/0x20 [spl]
 [<ffffffffa0450fcb>] txg_wait_open+0x8b/0x110 [zfs]
 [<ffffffffa041475d>] dmu_tx_wait+0xed/0xf0 [zfs]
 [<ffffffffa048560c>] zfs_setattr+0x14bc/0x17a0 [zfs]
 [<ffffffff8118e563>] ? d_rehash+0x13/0x20
 [<ffffffff81190e00>] ? d_splice_alias+0x10/0x110
 [<ffffffff8118fda7>] ? __d_lookup+0xa7/0x150
 [<ffffffff8119031a>] ? dput+0x9a/0x150
 [<ffffffff8115f93c>] ? __kmalloc+0x20c/0x220
 [<ffffffffa035509f>] ? kmem_alloc_debug+0x8f/0x4c0 [spl]
 [<ffffffffa049c5ac>] zpl_setattr+0xdc/0x110 [zfs]
 [<ffffffff81193cc8>] notify_change+0x168/0x340
 [<ffffffff811846e5>] ? putname+0x35/0x50
 [<ffffffff811a7a9c>] utimes_common+0xdc/0x1b0
 [<ffffffff811a7c4e>] do_utimes+0xde/0xf0
 [<ffffffff811a7c90>] sys_futimesat+0x30/0xb0
 [<ffffffff811a7d29>] sys_utimes+0x19/0x20
 [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b
@behlendorf
Copy link
Contributor Author

This is likely the cause of the lockup in #676.

@DeHackEd
Copy link
Contributor

http://dehacked.2y.net/sizes-4h.png
http://dehacked.2y.net/arc-4h.png
http://dehacked.2y.net/evict-4h.png

Your workaround you gave in IRC (increase ZFS_OBJ_MTX_SZ in include/sys/zfs_vfs.h to 256) does wonders in regards to fixing issue 676. Normally this system would be hung in 30 minutes.

@behlendorf
Copy link
Contributor Author

Excellent so that's pretty good confirmation this is the root cause of your deadlocks. Increasing the array size sounds like a reasonable short term workaround, but it just makes the issue less likely. It doesn't actually fix it. If you need to you could increase the value further until we have an proper fix.

@behlendorf
Copy link
Contributor Author

Here's the deadlock, it's basically a lock inversion. But now that I have the exact stacks I can put together a patch to prevent it. In the meanwhile increasing ZFS_OBJ_MTX_SZ is a decent work around.

du
Call Trace:
 [<ffffffff814eff92>] schedule_timeout+0x192/0x2e0
 [<ffffffffa0656746>] spl_kmem_cache_alloc+0xad6/0x18c0 [spl]
 [<ffffffffa070824e>] dbuf_create+0x8e/0x950 [zfs]
 [<ffffffffa070be51>] __dbuf_hold_impl+0x381/0xb40 [zfs]
 [<ffffffffa070c696>] dbuf_hold_impl+0x86/0xc0 [zfs]
 [<ffffffffa070ef80>] dbuf_hold+0x20/0x30 [zfs]
 [<ffffffffa07375a9>] dnode_hold_impl+0x119/0x9d0 [zfs]
 [<ffffffffa0737e79>] dnode_hold+0x19/0x20 [zfs]
 [<ffffffffa0719be7>] dmu_bonus_hold+0x37/0x3e0 [zfs]
 [<ffffffffa076c5ae>] sa_buf_hold+0xe/0x10 [zfs]
 [<ffffffffa07de15d>] zfs_zget+0x7d/0x430 [zfs]
 [<ffffffffa07b839b>] zfs_dirent_lock+0x5ab/0x6f0 [zfs]
 [<ffffffffa07b86f3>] zfs_dirlook+0x93/0x2e0 [zfs]
 [<ffffffffa07d6e9e>] zfs_lookup+0x2fe/0x350 [zfs]
 [<ffffffffa07f7b7c>] zpl_lookup+0x5c/0x120 [zfs]
 [<ffffffff8118604c>] do_lookup+0x16c/0x1e0
 [<ffffffff811868f4>] __link_path_walk+0x734/0x1030
 [<ffffffff8118747a>] path_walk+0x6a/0xe0
 [<ffffffff8118764b>] do_path_lookup+0x5b/0xa0
 [<ffffffff811882b7>] user_path_at+0x57/0xa0
 [<ffffffff8117cea6>] vfs_fstatat+0x46/0x80
 [<ffffffff8117cf04>] sys_newfstatat+0x24/0x50
 [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b

events/0
Call Trace:
 [<ffffffff814f07fe>] __mutex_lock_slowpath+0x13e/0x180
 [<ffffffff814f069b>] mutex_lock+0x2b/0x50
 [<ffffffffa07e01ff>] zfs_zinactive+0xaf/0x260 [zfs]
 [<ffffffffa07daa3f>] zfs_inactive+0x7f/0x3b0 [zfs]
 [<ffffffffa07f8a7e>] zpl_clear_inode+0xe/0x10 [zfs]
 [<ffffffff8119256c>] clear_inode+0xac/0x140
 [<ffffffff81192640>] dispose_list+0x40/0x120
 [<ffffffff81192994>] shrink_icache_memory+0x274/0x2e0
 [<ffffffff81129c0a>] shrink_slab+0x12a/0x1a0
 [<ffffffff8112ba6f>] do_try_to_free_pages+0x2ff/0x520
 [<ffffffff8112be7d>] try_to_free_pages+0x9d/0x130
 [<ffffffff81123dad>] __alloc_pages_nodemask+0x40d/0x940
 [<ffffffff8115876a>] alloc_pages_current+0xaa/0x110
 [<ffffffff81121f4e>] __get_free_pages+0xe/0x50
 [<ffffffffa0654cc8>] kv_alloc+0x58/0x130 [spl]
 [<ffffffffa0654de0>] spl_cache_grow_work+0x40/0x4b0 [spl]

@behlendorf
Copy link
Contributor Author

@DeHackEd Can you please apply the above patch to the spl. It should resolve the hard deadlock.

behlendorf added a commit to openzfs/spl that referenced this issue Nov 28, 2012
Allowing the spl_cache_grow_work() function to reclaim inodes
allows for two unlikely deadlocks.  Therefore, we clear __GFP_FS
for these allocations.  The two deadlocks are:

* While holding the ZFS_OBJ_HOLD_ENTER(zsb, obj1) lock a function
  calls kmem_cache_alloc() which happens to need to allocate a
  new slab.  To allocate the new slab we enter FS level reclaim
  and attempt to evict several inodes.  To evict these inodes we
  need to take the ZFS_OBJ_HOLD_ENTER(zsb, obj2) lock and it
  just happens that obj1 and obj2 use the same hashed lock.

* Similar to the first case however instead of getting blocked
  on the hash lock we block in txg_wait_open() which is waiting
  for the next txg which isn't coming because the txg_sync
  thread is blocked in kmem_cache_alloc().

Note this isn't a 100% fix because vmalloc() won't strictly
honor __GFP_FS.  However, it practice this is sufficient because
several very unlikely things must all occur concurrently.

Signed-off-by: Brian Behlendorf <[email protected]>
Issue openzfs/zfs#1101
@DeHackEd
Copy link
Contributor

Running off the latest GIT masters for both SPL and ZFS I hung the system like so:

http://pastebin.com/tuH8Y9kE

# free
             total       used       free     shared    buffers     cached
Mem:      16448180   16109116     339064          0       1520      16212
-/+ buffers/cache:   16091384     356796
Swap:      2097148     157880    1939268

Interestingly it's not hard deadlocked and I can still work around the areas of the disk which are not ZFS filesystems (with choppy performance). After doing killall -9 rsync it unjammed itself.

@behlendorf
Copy link
Contributor Author

@DeHackEd Did you have similar issues with 6c5207088f732168569d1a0b29f5f949b91bb503 applied?

behlendorf added a commit that referenced this issue Nov 30, 2012
Due to the slightly increased size of the ZFS super block
caused by 30315d2 there are now allocation warnings.  The
allocation size is still small (just over 8k) and super
blocks are rarely allocated so we suppress the warning.

Signed-off-by: Brian Behlendorf <[email protected]>
Issue #1101
@DeHackEd
Copy link
Contributor

DeHackEd commented Dec 1, 2012

Combining 6c5207088f732168569d1a0b29f5f949b91bb503 with the other patches (SPL and ZFS_OBJ_HOLD_ENTER -> 256) produced a system that under similar load may still exceed its arc_meta_limit but didn't crash. I think stacking all these patches together is outright necessary on my workload. :/

@behlendorf
Copy link
Contributor Author

@DeHackEd Just to be clear with all of these patches applied your system is stable? (aside from exceeding the limit). If so I'll revisit 6c5207088f732168569d1a0b29f5f949b91bb503 to see if I can be reworked in to something which can be merged. I'd hoped to defer further changes in there area until 0.7.0 but perhaps something minimal can be done.

unya pushed a commit to unya/zfs that referenced this issue Dec 13, 2013
Increasing this limit costs us 6144 bytes of memory per mounted
filesystem, but this is small price to pay for accomplishing
the following:

* Allows for up to 256-way concurreny when performing lookups
  which helps performance when there are a large number of
  processes.

* Minimizes the likelyhood of encountering the deadlock
  described in issue openzfs#1101.  Because vmalloc() won't strictly
  honor __GFP_FS there is still a very remote chance of a
  deadlock.  See the openzfs/spl@043f9b57 commit.

Signed-off-by: Brian Behlendorf <[email protected]>
Closes openzfs#1101
unya pushed a commit to unya/zfs that referenced this issue Dec 13, 2013
Due to the slightly increased size of the ZFS super block
caused by 30315d2 there are now allocation warnings.  The
allocation size is still small (just over 8k) and super
blocks are rarely allocated so we suppress the warning.

Signed-off-by: Brian Behlendorf <[email protected]>
Issue openzfs#1101
pcd1193182 pushed a commit to pcd1193182/zfs that referenced this issue Sep 26, 2023
)

Bumps [tempfile](https://github.com/Stebalien/tempfile) from 3.7.0 to 3.7.1.
- [Changelog](https://github.com/Stebalien/tempfile/blob/master/CHANGELOG.md)
- [Commits](Stebalien/tempfile@v3.7.0...v3.7.1)

---
updated-dependencies:
- dependency-name: tempfile
  dependency-type: indirect
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants