Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Constant freezing when free memory is low #3834

Closed
mihaivint opened this issue Sep 25, 2015 · 5 comments
Closed

Constant freezing when free memory is low #3834

mihaivint opened this issue Sep 25, 2015 · 5 comments
Milestone

Comments

@mihaivint
Copy link

There is enough cache on the system 4-5 GB but when free memory is low zfs freezes. Both 0.6.5.0 and 0.6.5.1
System is a virtual machine with 8GB ram 4Gb swap. It's running elasticsearch +rabbitmq on top of zfs. OS is Centos 7 3.10.0-229.11.1.el7.x86_64. Only thing set for zfs is:

options zfs zfs_arc_min=536870912
options zfs zfs_arc_max=2147483648

It constantly ends up with very high load due to zfs freezes :
On 0.6.5.0

Sep 23 16:34:26 log kernel: INFO: task kswapd0:47 blocked for more than 120 seconds.
Sep 23 16:34:26 log kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Sep 23 16:34:26 log kernel: kswapd0         D ffff88023fd13680     0    47      2 0x00000000
Sep 23 16:34:26 log kernel: ffff88023093bbd0 0000000000000046 ffff8802342bf1c0 ffff88023093bfd8
Sep 23 16:34:26 log kernel: ffff88023093bfd8 ffff88023093bfd8 ffff8802342bf1c0 ffff8802306cee00
Sep 23 16:34:26 log kernel: 0000000000000000 00000000006026dc ffff8802306ceec8 00000000006031c0
Sep 23 16:34:26 log kernel: Call Trace:
Sep 23 16:34:26 log kernel: [<ffffffff81609729>] schedule+0x29/0x70
Sep 23 16:34:26 log kernel: [<ffffffffa000dd1d>] taskq_wait_outstanding+0x9d/0x110 [spl]
Sep 23 16:34:26 log kernel: [<ffffffff81098230>] ? wake_up_bit+0x30/0x30
Sep 23 16:34:26 log kernel: [<ffffffffa00b7523>] arc_kmem_reap_now+0xf3/0x100 [zfs]
Sep 23 16:34:26 log kernel: [<ffffffffa00bd40f>] arc_shrinker_func+0xaf/0x170 [zfs]
Sep 23 16:34:26 log kernel: [<ffffffff81169095>] shrink_slab+0x165/0x300
Sep 23 16:34:26 log kernel: [<ffffffff811c0381>] ? vmpressure+0x21/0x90
Sep 23 16:34:26 log kernel: [<ffffffff8116cce1>] balance_pgdat+0x4b1/0x5e0
Sep 23 16:34:26 log kernel: [<ffffffff8116cf83>] kswapd+0x173/0x450
Sep 23 16:34:26 log kernel: [<ffffffff81098230>] ? wake_up_bit+0x30/0x30
Sep 23 16:34:26 log kernel: [<ffffffff8116ce10>] ? balance_pgdat+0x5e0/0x5e0
Sep 23 16:34:26 log kernel: [<ffffffff8109726f>] kthread+0xcf/0xe0
Sep 23 16:34:26 log kernel: [<ffffffff810971a0>] ? kthread_create_on_node+0x140/0x140
Sep 23 16:34:26 log kernel: [<ffffffff816141d8>] ret_from_fork+0x58/0x90
Sep 23 16:34:26 log kernel: [<ffffffff810971a0>] ? kthread_create_on_node+0x140/0x140
Sep 23 16:34:26 log kernel: INFO: task arc_prune:129 blocked for more than 120 seconds.
Sep 23 16:34:26 log kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Sep 23 16:34:26 log kernel: arc_prune       D ffff88023fc13680     0   129      2 0x00000000
Sep 23 16:34:26 log kernel: ffff880035f638b8 0000000000000046 ffff8802304e4fa0 ffff880035f63fd8
Sep 23 16:34:26 log kernel: ffff880035f63fd8 ffff880035f63fd8 ffff8802304e4fa0 ffff8801492eee88
Sep 23 16:34:26 log kernel: ffff8801492eee38 ffff8801492eee90 ffff8801492eee60 0000000000000000
Sep 23 16:34:26 log kernel: Call Trace:
Sep 23 16:34:26 log kernel: [<ffffffff81609729>] schedule+0x29/0x70
Sep 23 16:34:26 log kernel: [<ffffffffa00118d5>] cv_wait_common+0x125/0x150 [spl]
Sep 23 16:34:26 log kernel: [<ffffffff81098230>] ? wake_up_bit+0x30/0x30
Sep 23 16:34:26 log kernel: [<ffffffffa0011915>] __cv_wait+0x15/0x20 [spl]
Sep 23 16:34:26 log kernel: [<ffffffffa00c46db>] dbuf_read+0x4ab/0x930 [zfs]
Sep 23 16:34:26 log kernel: [<ffffffffa00df95b>] ? dnode_rele_and_unlock+0x5b/0x90 [zfs]
Sep 23 16:34:26 log kernel: [<ffffffffa00df9c9>] ? dnode_rele+0x39/0x40 [zfs]
Sep 23 16:34:26 log kernel: [<ffffffffa00cde6d>] ? dmu_buf_hold_noread+0xad/0x140 [zfs]
Sep 23 16:34:26 log kernel: [<ffffffffa00cdf50>] dmu_buf_hold+0x50/0x80 [zfs]
Sep 23 16:34:26 log kernel: [<ffffffffa0131dcd>] zap_idx_to_blk+0xcd/0x180 [zfs]
Sep 23 16:34:26 log kernel: [<ffffffffa0132601>] zap_deref_leaf+0x61/0xa0 [zfs]
Sep 23 16:34:26 log kernel: [<ffffffffa013376f>] fzap_remove+0x3f/0xb0 [zfs]
Sep 23 16:34:26 log kernel: [<ffffffffa0136f43>] ? zap_name_alloc+0x73/0xd0 [zfs]
Sep 23 16:34:26 log kernel: [<ffffffffa0138e3b>] zap_remove_norm+0x17b/0x1e0 [zfs]
Sep 23 16:34:26 log kernel: [<ffffffffa0138eb3>] zap_remove+0x13/0x20 [zfs]
Sep 23 16:34:26 log kernel: [<ffffffffa0133ea4>] zap_remove_int+0x54/0x80 [zfs]
Sep 23 16:34:26 log kernel: [<ffffffffa01426f4>] zfs_rmnode+0x224/0x350 [zfs]
Sep 23 16:34:26 log kernel: [<ffffffff81607ac2>] ? mutex_lock+0x12/0x2f
Sep 23 16:34:26 log kernel: [<ffffffffa0163d88>] zfs_zinactive+0x168/0x180 [zfs]
Sep 23 16:34:26 log kernel: [<ffffffffa015d4c7>] zfs_inactive+0x67/0x240 [zfs]
Sep 23 16:34:26 log kernel: [<ffffffff81166859>] ? truncate_pagecache+0x59/0x60
Sep 23 16:34:26 log kernel: [<ffffffffa0174ad3>] zpl_evict_inode+0x43/0x60 [zfs]
Sep 23 16:34:26 log kernel: [<ffffffff811e1f87>] evict+0xa7/0x170
Sep 23 16:34:26 log kernel: [<ffffffff811e208e>] dispose_list+0x3e/0x50
Sep 23 16:34:26 log kernel: [<ffffffff811e2f63>] prune_icache_sb+0x163/0x320
Sep 23 16:34:26 log kernel: [<ffffffff811ca0a6>] prune_super+0xd6/0x1a0
Sep 23 16:34:26 log kernel: [<ffffffffa0153192>] zfs_sb_prune+0xa2/0xc0 [zfs]
Sep 23 16:34:26 log kernel: [<ffffffffa0175135>] zpl_prune_sb+0x35/0x50 [zfs]
Sep 23 16:34:26 log kernel: [<ffffffffa00b754f>] arc_prune_task+0x1f/0x50 [zfs]
Sep 23 16:34:26 log kernel: [<ffffffffa000d7ee>] taskq_thread+0x21e/0x420 [spl]
Sep 23 16:34:26 log kernel: [<ffffffff810a9500>] ? wake_up_state+0x20/0x20
Sep 23 16:34:26 log kernel: [<ffffffffa000d5d0>] ? taskq_thread_spawn+0x60/0x60 [spl]
Sep 23 16:34:26 log kernel: [<ffffffff8109726f>] kthread+0xcf/0xe0
Sep 23 16:34:26 log kernel: [<ffffffff810971a0>] ? kthread_create_on_node+0x140/0x140
Sep 23 16:34:26 log kernel: [<ffffffff816141d8>] ret_from_fork+0x58/0x90
Sep 23 16:34:26 log kernel: [<ffffffff810971a0>] ? kthread_create_on_node+0x140/0x140
Sep 23 16:34:26 log kernel: INFO: task arc_reclaim:130 blocked for more than 120 seconds.
Sep 23 16:34:26 log kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Sep 23 16:34:26 log kernel: arc_reclaim     D ffff88023fc93680     0   130      2 0x00000000
Sep 23 16:34:26 log kernel: ffff880035f67db8 0000000000000046 ffff8802304e5b00 ffff880035f67fd8
Sep 23 16:34:26 log kernel: ffff880035f67fd8 ffff880035f67fd8 ffff8802304e5b00 ffff8802306cee00
Sep 23 16:34:26 log kernel: 0000000000000000 00000000006026dc ffff8802306ceec8 00000000006031c0
Sep 23 16:34:26 log kernel: Call Trace:
Sep 23 16:34:26 log kernel: [<ffffffff81609729>] schedule+0x29/0x70
Sep 23 16:34:26 log kernel: [<ffffffffa000dd1d>] taskq_wait_outstanding+0x9d/0x110 [spl]
Sep 23 16:34:26 log kernel: [<ffffffff81098230>] ? wake_up_bit+0x30/0x30
Sep 23 16:34:26 log kernel: [<ffffffffa00b7523>] arc_kmem_reap_now+0xf3/0x100 [zfs]
Sep 23 16:34:26 log kernel: [<ffffffffa00bd324>] arc_reclaim_thread+0x1f4/0x230 [zfs]
Sep 23 16:34:26 log kernel: [<ffffffffa00bd130>] ? arc_shrink+0xc0/0xc0 [zfs]
Sep 23 16:34:26 log kernel: [<ffffffffa000c931>] thread_generic_wrapper+0x71/0x80 [spl]
Sep 23 16:34:26 log kernel: [<ffffffffa000c8c0>] ? __thread_exit+0x20/0x20 [spl]
Sep 23 16:34:26 log kernel: [<ffffffff8109726f>] kthread+0xcf/0xe0
Sep 23 16:34:26 log kernel: [<ffffffff810971a0>] ? kthread_create_on_node+0x140/0x140
Sep 23 16:34:26 log kernel: [<ffffffff816141d8>] ret_from_fork+0x58/0x90
Sep 23 16:34:26 log kernel: [<ffffffff810971a0>] ? kthread_create_on_node+0x140/0x140
Sep 23 16:34:26 log kernel: INFO: task z_iput:707 blocked for more than 120 seconds.
Sep 23 16:34:26 log kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Sep 23 16:34:26 log kernel: z_iput          D ffff88023fc93680     0   707      2 0x00000000
Sep 23 16:34:26 log kernel: ffff8802329f3950 0000000000000046 ffff8802325eb8e0 ffff8802329f3fd8
Sep 23 16:34:26 log kernel: ffff8802329f3fd8 ffff8802329f3fd8 ffff8802325eb8e0 ffff8801492eee88
Sep 23 16:34:26 log kernel: ffff8801492eee38 ffff8801492eee90 ffff8801492eee60 0000000000000000
Sep 23 16:34:26 log kernel: Call Trace:
Sep 23 16:34:26 log kernel: [<ffffffff81609729>] schedule+0x29/0x70
Sep 23 16:34:26 log kernel: [<ffffffffa00118d5>] cv_wait_common+0x125/0x150 [spl]
Sep 23 16:34:26 log kernel: [<ffffffff81098230>] ? wake_up_bit+0x30/0x30
Sep 23 16:34:26 log kernel: [<ffffffffa0011915>] __cv_wait+0x15/0x20 [spl]
Sep 23 16:34:26 log kernel: [<ffffffffa00c46db>] dbuf_read+0x4ab/0x930 [zfs]
Sep 23 16:34:26 log kernel: [<ffffffffa00df95b>] ? dnode_rele_and_unlock+0x5b/0x90 [zfs]
Sep 23 16:34:26 log kernel: [<ffffffffa00df9c9>] ? dnode_rele+0x39/0x40 [zfs]
Sep 23 16:34:26 log kernel: [<ffffffffa00cde6d>] ? dmu_buf_hold_noread+0xad/0x140 [zfs]
Sep 23 16:34:26 log kernel: [<ffffffffa00cdf50>] dmu_buf_hold+0x50/0x80 [zfs]
Sep 23 16:34:26 log kernel: [<ffffffffa0131dcd>] zap_idx_to_blk+0xcd/0x180 [zfs]
Sep 23 16:34:26 log kernel: [<ffffffffa0132601>] zap_deref_leaf+0x61/0xa0 [zfs]
Sep 23 16:34:26 log kernel: [<ffffffffa00cf0a7>] ? dmu_object_info_from_db+0x37/0x40 [zfs]
Sep 23 16:34:26 log kernel: [<ffffffffa0133385>] fzap_add_cd+0x55/0x140 [zfs]
Sep 23 16:34:26 log kernel: [<ffffffffa000a9f7>] ? spl_kmem_alloc+0xc7/0x170 [spl]
Sep 23 16:34:26 log kernel: [<ffffffffa01334da>] fzap_add+0x6a/0x80 [zfs]
Sep 23 16:34:26 log kernel: [<ffffffffa0138866>] zap_add+0x126/0x180 [zfs]
Sep 23 16:34:26 log kernel: [<ffffffffa0133e28>] zap_add_int+0x68/0x90 [zfs]
Sep 23 16:34:26 log kernel: [<ffffffffa0100030>] ? gzip_compress+0xa0/0xa0 [zfs]
Sep 23 16:34:26 log kernel: [<ffffffffa014170c>] zfs_unlinked_add+0x3c/0x90 [zfs]
Sep 23 16:34:26 log kernel: [<ffffffffa01426db>] zfs_rmnode+0x20b/0x350 [zfs]
Sep 23 16:34:26 log kernel: [<ffffffffa0163d88>] zfs_zinactive+0x168/0x180 [zfs]
Sep 23 16:34:26 log kernel: [<ffffffffa015d4c7>] zfs_inactive+0x67/0x240 [zfs]
Sep 23 16:34:26 log kernel: [<ffffffff81166859>] ? truncate_pagecache+0x59/0x60
Sep 23 16:34:26 log kernel: [<ffffffffa0174ad3>] zpl_evict_inode+0x43/0x60 [zfs]
Sep 23 16:34:26 log kernel: [<ffffffff811e1f87>] evict+0xa7/0x170
Sep 23 16:34:26 log kernel: [<ffffffff811e27c5>] iput+0xf5/0x180
Sep 23 16:34:26 log kernel: [<ffffffffa000d7ee>] taskq_thread+0x21e/0x420 [spl]
Sep 23 16:34:26 log kernel: [<ffffffff810a9500>] ? wake_up_state+0x20/0x20
Sep 23 16:34:26 log kernel: [<ffffffffa000d5d0>] ? taskq_thread_spawn+0x60/0x60 [spl]
Sep 23 16:34:26 log kernel: [<ffffffff8109726f>] kthread+0xcf/0xe0
Sep 23 16:34:26 log kernel: [<ffffffff810971a0>] ? kthread_create_on_node+0x140/0x140
Sep 23 16:34:26 log kernel: [<ffffffff816141d8>] ret_from_fork+0x58/0x90
Sep 23 16:34:26 log kernel: [<ffffffff810971a0>] ? kthread_create_on_node+0x140/0x140
Sep 23 16:34:26 log kernel: INFO: task txg_quiesce:717 blocked for more than 120 seconds.
Sep 23 16:34:26 log kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Sep 23 16:34:26 log kernel: txg_quiesce     D ffff88023fd93680     0   717      2 0x00000000
Sep 23 16:34:26 log kernel: ffff88023327bd38 0000000000000046 ffff880232996660 ffff88023327bfd8
Sep 23 16:34:26 log kernel: ffff88023327bfd8 ffff88023327bfd8 ffff880232996660 ffff88022e7840b8
Sep 23 16:34:26 log kernel: ffff88022e784038 ffff88022e7840c0 ffff88022e784060 0000000000000000
Sep 23 16:34:26 log kernel: Call Trace:
Sep 23 16:34:26 log kernel: [<ffffffff81609729>] schedule+0x29/0x70
Sep 23 16:34:26 log kernel: [<ffffffffa00118d5>] cv_wait_common+0x125/0x150 [spl]
Sep 23 16:34:26 log kernel: [<ffffffff81098230>] ? wake_up_bit+0x30/0x30
Sep 23 16:34:26 log kernel: [<ffffffffa0011915>] __cv_wait+0x15/0x20 [spl]
Sep 23 16:34:26 log kernel: [<ffffffffa01220db>] txg_quiesce_thread+0x2eb/0x400 [zfs]
Sep 23 16:34:26 log kernel: [<ffffffffa0121df0>] ? txg_init+0x2b0/0x2b0 [zfs]
Sep 23 16:34:26 log kernel: [<ffffffffa000c931>] thread_generic_wrapper+0x71/0x80 [spl]
Sep 23 16:34:26 log kernel: [<ffffffffa000c8c0>] ? __thread_exit+0x20/0x20 [spl]
Sep 23 16:34:26 log kernel: [<ffffffff8109726f>] kthread+0xcf/0xe0
Sep 23 16:34:26 log kernel: [<ffffffff810971a0>] ? kthread_create_on_node+0x140/0x140
Sep 23 16:34:26 log kernel: [<ffffffff816141d8>] ret_from_fork+0x58/0x90
Sep 23 16:34:26 log kernel: [<ffffffff810971a0>] ? kthread_create_on_node+0x140/0x140
Sep 23 16:34:26 log kernel: INFO: task java:2821 blocked for more than 120 seconds.
Sep 23 16:34:26 log kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Sep 23 16:34:26 log kernel: java            D ffff88023fc13680     0  2821      1 0x00000080
Sep 23 16:34:26 log kernel: ffff8802318e3920 0000000000000086 ffff88021e4acfa0 ffff8802318e3fd8
Sep 23 16:34:26 log kernel: ffff8802318e3fd8 ffff8802318e3fd8 ffff88021e4acfa0 ffffffffa022d6e0
Sep 23 16:34:26 log kernel: ffffffffa022d7a0 ffffffffa022d6e8 ffffffffa022d7c8 0000000000000000
Sep 23 16:34:26 log kernel: Call Trace:
Sep 23 16:34:26 log kernel: [<ffffffff81609729>] schedule+0x29/0x70
Sep 23 16:34:26 log kernel: [<ffffffffa00118d5>] cv_wait_common+0x125/0x150 [spl]
Sep 23 16:34:26 log kernel: [<ffffffff81098230>] ? wake_up_bit+0x30/0x30
Sep 23 16:34:26 log kernel: [<ffffffffa0011915>] __cv_wait+0x15/0x20 [spl]
Sep 23 16:34:26 log kernel: [<ffffffffa00b89c7>] arc_get_data_buf.isra.21+0x3e7/0x3f0 [zfs]
Sep 23 16:34:26 log kernel: [<ffffffffa00bbd3f>] arc_buf_alloc+0x10f/0x190 [zfs]
Sep 23 16:34:26 log kernel: [<ffffffffa00bc5a5>] arc_read+0x1c5/0xad0 [zfs]
Sep 23 16:34:26 log kernel: [<ffffffffa00c2a40>] ? dbuf_rele_and_unlock+0x460/0x460 [zfs]
Sep 23 16:34:26 log kernel: [<ffffffffa00c44dd>] dbuf_read+0x2ad/0x930 [zfs]
Sep 23 16:34:26 log kernel: [<ffffffffa00cdf50>] dmu_buf_hold+0x50/0x80 [zfs]
Sep 23 16:34:26 log kernel: [<ffffffffa0137418>] zap_lockdir+0x58/0x900 [zfs]
Sep 23 16:34:26 log kernel: [<ffffffffa0137ebc>] zap_cursor_retrieve+0x1fc/0x320 [zfs]
Sep 23 16:34:26 log kernel: [<ffffffffa010a8fc>] ? sa_lookup+0x9c/0xc0 [zfs]
Sep 23 16:34:26 log kernel: [<ffffffff811da35d>] ? filldir+0x7d/0xf0
Sep 23 16:34:26 log kernel: [<ffffffffa015856e>] zfs_readdir+0x14e/0x4c0 [zfs]
Sep 23 16:34:26 log kernel: [<ffffffff811e65ee>] ? mntput_no_expire+0x3e/0x120
Sep 23 16:34:26 log kernel: [<ffffffff811e66f4>] ? mntput+0x24/0x40
Sep 23 16:34:26 log kernel: [<ffffffff811d6736>] ? path_openat+0x166/0x490
Sep 23 16:34:26 log kernel: [<ffffffffa0106b4b>] ? rrm_exit+0x4b/0x90 [zfs]
Sep 23 16:34:26 log kernel: [<ffffffffa01733f6>] zpl_readdir+0x76/0xc0 [zfs]
Sep 23 16:34:26 log kernel: [<ffffffff811da2e0>] ? fillonedir+0xe0/0xe0
Sep 23 16:34:26 log kernel: [<ffffffff811da2e0>] ? fillonedir+0xe0/0xe0
Sep 23 16:34:26 log kernel: [<ffffffff811da1d0>] vfs_readdir+0xb0/0xe0
Sep 23 16:34:26 log kernel: [<ffffffff811da5f5>] SyS_getdents+0x95/0x120
Sep 23 16:34:26 log kernel: [<ffffffff81614289>] system_call_fastpath+0x16/0x1b
Sep 23 16:34:26 log kernel: INFO: task java:2907 blocked for more than 120 seconds.
Sep 23 16:34:26 log kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Sep 23 16:34:26 log kernel: java            D ffff88023fd93680     0  2907      1 0x00000080
Sep 23 16:34:26 log kernel: ffff88009b3e79a8 0000000000000086 ffff88023426f1c0 ffff88009b3e7fd8
Sep 23 16:34:26 log kernel: ffff88009b3e7fd8 ffff88009b3e7fd8 ffff88023426f1c0 ffffffffa022d6e0
Sep 23 16:34:26 log kernel: ffffffffa022d7a0 ffffffffa022d6e8 ffffffffa022d7c8 0000000000000000
Sep 23 16:34:26 log kernel: Call Trace:
Sep 23 16:34:26 log kernel: [<ffffffff81609729>] schedule+0x29/0x70
Sep 23 16:34:26 log kernel: [<ffffffffa00118d5>] cv_wait_common+0x125/0x150 [spl]
Sep 23 16:34:26 log kernel: [<ffffffff81098230>] ? wake_up_bit+0x30/0x30
Sep 23 16:34:26 log kernel: [<ffffffffa0011915>] __cv_wait+0x15/0x20 [spl]
Sep 23 16:34:26 log kernel: [<ffffffffa00b89c7>] arc_get_data_buf.isra.21+0x3e7/0x3f0 [zfs]
Sep 23 16:34:26 log kernel: [<ffffffffa00bbd3f>] arc_buf_alloc+0x10f/0x190 [zfs]
Sep 23 16:34:26 log kernel: [<ffffffffa00bc5a5>] arc_read+0x1c5/0xad0 [zfs]
Sep 23 16:34:26 log kernel: [<ffffffffa00c2a40>] ? dbuf_rele_and_unlock+0x460/0x460 [zfs]
Sep 23 16:34:26 log kernel: [<ffffffffa00c44dd>] dbuf_read+0x2ad/0x930 [zfs]
Sep 23 16:34:26 log kernel: [<ffffffffa00c4fd8>] __dbuf_hold_impl+0x478/0x500 [zfs]
Sep 23 16:34:26 log kernel: [<ffffffffa00c50d3>] dbuf_hold_impl+0x73/0xa0 [zfs]
Sep 23 16:34:26 log kernel: [<ffffffffa00c540f>] dbuf_hold+0x2f/0x60 [zfs]
Sep 23 16:34:26 log kernel: [<ffffffffa00ccdbd>] dmu_buf_hold_array_by_dnode+0xfd/0x4d0 [zfs]
Sep 23 16:34:26 log kernel: [<ffffffffa000a9f7>] ? spl_kmem_alloc+0xc7/0x170 [spl]
Sep 23 16:34:26 log kernel: [<ffffffffa00cda92>] dmu_read_uio_dnode+0x52/0x100 [zfs]
Sep 23 16:34:26 log kernel: [<ffffffffa00ceb47>] dmu_read_uio_dbuf+0x47/0x60 [zfs]
Sep 23 16:34:26 log kernel: [<ffffffffa0156a0d>] zfs_read+0x13d/0x400 [zfs]
Sep 23 16:34:26 log kernel: [<ffffffffa0173700>] zpl_read_common_iovec.constprop.10+0x80/0xd0 [zfs]
Sep 23 16:34:26 log kernel: [<ffffffffa017385e>] zpl_read+0x7e/0xb0 [zfs]
Sep 23 16:34:26 log kernel: [<ffffffff811c66ec>] vfs_read+0x9c/0x170
Sep 23 16:34:26 log kernel: [<ffffffff811c73b2>] SyS_pread64+0x92/0xc0
Sep 23 16:34:26 log kernel: [<ffffffff81614289>] system_call_fastpath+0x16/0x1b
Sep 23 16:34:26 log kernel: INFO: task java:2908 blocked for more than 120 seconds.
Sep 23 16:34:26 log kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Sep 23 16:34:26 log kernel: java            D ffff88023fc13680     0  2908      1 0x00000080
Sep 23 16:34:26 log kernel: ffff88009b3eba90 0000000000000086 ffff88023426cfa0 ffff88009b3ebfd8
Sep 23 16:34:26 log kernel: ffff88009b3ebfd8 ffff88009b3ebfd8 ffff88023426cfa0 ffffffffa022d6e0
Sep 23 16:34:26 log kernel: ffffffffa022d7a0 ffffffffa022d6e8 ffffffffa022d7c8 0000000000000000
Sep 23 16:34:26 log kernel: Call Trace:
Sep 23 16:34:26 log kernel: [<ffffffff81609729>] schedule+0x29/0x70
Sep 23 16:34:26 log kernel: [<ffffffffa00118d5>] cv_wait_common+0x125/0x150 [spl]
Sep 23 16:34:26 log kernel: [<ffffffff81098230>] ? wake_up_bit+0x30/0x30
Sep 23 16:34:26 log kernel: [<ffffffffa0011915>] __cv_wait+0x15/0x20 [spl]
Sep 23 16:34:26 log kernel: [<ffffffffa00b89c7>] arc_get_data_buf.isra.21+0x3e7/0x3f0 [zfs]
Sep 23 16:34:26 log kernel: [<ffffffffa00bc715>] arc_read+0x335/0xad0 [zfs]
Sep 23 16:34:26 log kernel: [<ffffffffa00c40a6>] ? dbuf_create+0x386/0x4e0 [zfs]
Sep 23 16:34:26 log kernel: [<ffffffffa00c2a40>] ? dbuf_rele_and_unlock+0x460/0x460 [zfs]
Sep 23 16:34:26 log kernel: [<ffffffffa00c44dd>] dbuf_read+0x2ad/0x930 [zfs]
Sep 23 16:34:26 log kernel: [<ffffffffa00ccde3>] dmu_buf_hold_array_by_dnode+0x123/0x4d0 [zfs]
Sep 23 16:34:26 log kernel: [<ffffffffa00cda92>] dmu_read_uio_dnode+0x52/0x100 [zfs]
Sep 23 16:34:26 log kernel: [<ffffffffa00ceb47>] dmu_read_uio_dbuf+0x47/0x60 [zfs]
Sep 23 16:34:26 log kernel: [<ffffffffa0156a0d>] zfs_read+0x13d/0x400 [zfs]
Sep 23 16:34:26 log kernel: [<ffffffffa0173700>] zpl_read_common_iovec.constprop.10+0x80/0xd0 [zfs]
Sep 23 16:34:26 log kernel: [<ffffffffa017385e>] zpl_read+0x7e/0xb0 [zfs]
Sep 23 16:34:26 log kernel: [<ffffffff811c66ec>] vfs_read+0x9c/0x170
Sep 23 16:34:26 log kernel: [<ffffffff811c73b2>] SyS_pread64+0x92/0xc0
Sep 23 16:34:26 log kernel: [<ffffffff81614289>] system_call_fastpath+0x16/0x1b
Sep 23 16:34:26 log kernel: INFO: task java:2909 blocked for more than 120 seconds.
Sep 23 16:34:26 log kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Sep 23 16:34:26 log kernel: java            D ffff88023fc13680     0  2909      1 0x00000080
Sep 23 16:34:26 log kernel: ffff88009b3ef9a8 0000000000000086 ffff8802308ead80 ffff88009b3effd8
Sep 23 16:34:26 log kernel: ffff88009b3effd8 ffff88009b3effd8 ffff8802308ead80 ffffffffa022d6e0
Sep 23 16:34:26 log kernel: ffffffffa022d7a0 ffffffffa022d6e8 ffffffffa022d7c8 0000000000000000
Sep 23 16:34:26 log kernel: Call Trace:
Sep 23 16:34:26 log kernel: [<ffffffff81609729>] schedule+0x29/0x70
Sep 23 16:34:26 log kernel: [<ffffffffa00118d5>] cv_wait_common+0x125/0x150 [spl]
Sep 23 16:34:26 log kernel: [<ffffffff81098230>] ? wake_up_bit+0x30/0x30
Sep 23 16:34:26 log kernel: [<ffffffffa0011915>] __cv_wait+0x15/0x20 [spl]
Sep 23 16:34:26 log kernel: [<ffffffffa00b89c7>] arc_get_data_buf.isra.21+0x3e7/0x3f0 [zfs]
Sep 23 16:34:26 log kernel: [<ffffffffa00bbd3f>] arc_buf_alloc+0x10f/0x190 [zfs]
Sep 23 16:34:26 log kernel: [<ffffffffa00bc5a5>] arc_read+0x1c5/0xad0 [zfs]
Sep 23 16:34:26 log kernel: [<ffffffffa00c2a40>] ? dbuf_rele_and_unlock+0x460/0x460 [zfs]
Sep 23 16:34:26 log kernel: [<ffffffffa00c44dd>] dbuf_read+0x2ad/0x930 [zfs]
Sep 23 16:34:26 log kernel: [<ffffffffa00c4fd8>] __dbuf_hold_impl+0x478/0x500 [zfs]
Sep 23 16:34:26 log kernel: [<ffffffffa00c50d3>] dbuf_hold_impl+0x73/0xa0 [zfs]
Sep 23 16:34:26 log kernel: [<ffffffffa00c540f>] dbuf_hold+0x2f/0x60 [zfs]
Sep 23 16:34:26 log kernel: [<ffffffffa00ccdbd>] dmu_buf_hold_array_by_dnode+0xfd/0x4d0 [zfs]
Sep 23 16:34:26 log kernel: [<ffffffffa000a9f7>] ? spl_kmem_alloc+0xc7/0x170 [spl]
Sep 23 16:34:26 log kernel: [<ffffffffa00cda92>] dmu_read_uio_dnode+0x52/0x100 [zfs]
Sep 23 16:34:26 log kernel: [<ffffffffa00ceb47>] dmu_read_uio_dbuf+0x47/0x60 [zfs]
Sep 23 16:34:26 log kernel: [<ffffffffa0156a0d>] zfs_read+0x13d/0x400 [zfs]
Sep 23 16:34:26 log kernel: [<ffffffffa0173700>] zpl_read_common_iovec.constprop.10+0x80/0xd0 [zfs]
Sep 23 16:34:26 log kernel: [<ffffffffa017385e>] zpl_read+0x7e/0xb0 [zfs]
Sep 23 16:34:26 log kernel: [<ffffffff811c66ec>] vfs_read+0x9c/0x170
Sep 23 16:34:26 log kernel: [<ffffffff811c73b2>] SyS_pread64+0x92/0xc0
Sep 23 16:34:26 log kernel: [<ffffffff81614289>] system_call_fastpath+0x16/0x1b
Sep 23 16:34:26 log kernel: INFO: task java:2910 blocked for more than 120 seconds.
Sep 23 16:34:26 log kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Sep 23 16:34:26 log kernel: java            D ffff88023fc93680     0  2910      1 0x00000080
Sep 23 16:34:26 log kernel: ffff88009b3f3a90 0000000000000086 ffff8802308e8b60 ffff88009b3f3fd8
Sep 23 16:34:26 log kernel: ffff88009b3f3fd8 ffff88009b3f3fd8 ffff8802308e8b60 ffffffffa022d6e0
Sep 23 16:34:26 log kernel: ffffffffa022d7a0 ffffffffa022d6e8 ffffffffa022d7c8 0000000000000000
Sep 23 16:34:26 log kernel: Call Trace:
Sep 23 16:34:26 log kernel: [<ffffffff81609729>] schedule+0x29/0x70
Sep 23 16:34:26 log kernel: [<ffffffffa00118d5>] cv_wait_common+0x125/0x150 [spl]
Sep 23 16:34:26 log kernel: [<ffffffff81098230>] ? wake_up_bit+0x30/0x30
Sep 23 16:34:26 log kernel: [<ffffffffa0011915>] __cv_wait+0x15/0x20 [spl]
Sep 23 16:34:26 log kernel: [<ffffffffa00b89c7>] arc_get_data_buf.isra.21+0x3e7/0x3f0 [zfs]
Sep 23 16:34:26 log kernel: [<ffffffffa00bc715>] arc_read+0x335/0xad0 [zfs]
Sep 23 16:34:26 log kernel: [<ffffffffa00c40a6>] ? dbuf_create+0x386/0x4e0 [zfs]
Sep 23 16:34:26 log kernel: [<ffffffffa00c2a40>] ? dbuf_rele_and_unlock+0x460/0x460 [zfs]
Sep 23 16:34:26 log kernel: [<ffffffffa00c44dd>] dbuf_read+0x2ad/0x930 [zfs]
Sep 23 16:34:26 log kernel: [<ffffffffa00ccde3>] dmu_buf_hold_array_by_dnode+0x123/0x4d0 [zfs]
Sep 23 16:34:26 log kernel: [<ffffffffa00cda92>] dmu_read_uio_dnode+0x52/0x100 [zfs]
Sep 23 16:34:26 log kernel: [<ffffffffa00ceb47>] dmu_read_uio_dbuf+0x47/0x60 [zfs]
Sep 23 16:34:26 log kernel: [<ffffffffa0156a0d>] zfs_read+0x13d/0x400 [zfs]
Sep 23 16:34:26 log kernel: [<ffffffffa0173700>] zpl_read_common_iovec.constprop.10+0x80/0xd0 [zfs]
Sep 23 16:34:26 log kernel: [<ffffffffa017385e>] zpl_read+0x7e/0xb0 [zfs]
Sep 23 16:34:26 log kernel: [<ffffffff811c66ec>] vfs_read+0x9c/0x170
Sep 23 16:34:26 log kernel: [<ffffffff811c73b2>] SyS_pread64+0x92/0xc0
Sep 23 16:34:26 log kernel: [<ffffffff81614289>] system_call_fastpath+0x16/0x1b

On 0.6.5.1

Sep 24 20:41:11 log kernel: INFO: task kthreadd:2 blocked for more than 120 seconds.
Sep 24 20:41:11 log kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Sep 24 20:41:11 log kernel: kthreadd        D ffff88023fd13680     0     2      0 0x00000000
Sep 24 20:41:11 log kernel: ffff88023417b9d0 0000000000000046 ffff880234160b60 ffff88023417bfd8
Sep 24 20:41:11 log kernel: ffff88023417bfd8 ffff88023417bfd8 ffff880234160b60 ffff88022f89ee00
Sep 24 20:41:11 log kernel: 0000000000000000 0000000003c57e3e ffff88022f89eec8 0000000003c5b1da
Sep 24 20:41:11 log kernel: Call Trace:
Sep 24 20:41:11 log kernel: [<ffffffff81609729>] schedule+0x29/0x70
Sep 24 20:41:11 log kernel: [<ffffffffa000dd1d>] taskq_wait_outstanding+0x9d/0x110 [spl]
Sep 24 20:41:11 log kernel: [<ffffffff81098230>] ? wake_up_bit+0x30/0x30
Sep 24 20:41:11 log kernel: [<ffffffffa00b7523>] arc_kmem_reap_now+0xf3/0x100 [zfs]
Sep 24 20:41:11 log kernel: [<ffffffffa00bd40f>] arc_shrinker_func+0xaf/0x170 [zfs]
Sep 24 20:41:11 log kernel: [<ffffffff81169095>] shrink_slab+0x165/0x300
Sep 24 20:41:11 log kernel: [<ffffffff811c0381>] ? vmpressure+0x21/0x90
Sep 24 20:41:11 log kernel: [<ffffffff8116c1e2>] do_try_to_free_pages+0x3c2/0x4e0
Sep 24 20:41:11 log kernel: [<ffffffff8116c3fc>] try_to_free_pages+0xfc/0x180
Sep 24 20:41:11 log kernel: [<ffffffff81160a7d>] __alloc_pages_nodemask+0x7fd/0xb90
Sep 24 20:41:11 log kernel: [<ffffffff810b1c00>] ? update_curr+0x50/0x150
Sep 24 20:41:11 log kernel: [<ffffffff8106bee4>] copy_process.part.25+0x134/0x14d0
Sep 24 20:41:11 log kernel: [<ffffffff810abe95>] ? sched_clock_cpu+0xb5/0x100
Sep 24 20:41:11 log kernel: [<ffffffff810971a0>] ? kthread_create_on_node+0x140/0x140
Sep 24 20:41:11 log kernel: [<ffffffff8106d43c>] do_fork+0xbc/0x350
Sep 24 20:41:11 log kernel: [<ffffffff8106d6f6>] kernel_thread+0x26/0x30
Sep 24 20:41:11 log kernel: [<ffffffff81097d72>] kthreadd+0x2b2/0x2f0
Sep 24 20:41:11 log kernel: [<ffffffff81097ac0>] ? kthread_create_on_cpu+0x60/0x60
Sep 24 20:41:11 log kernel: [<ffffffff816141d8>] ret_from_fork+0x58/0x90
Sep 24 20:41:11 log kernel: [<ffffffff81097ac0>] ? kthread_create_on_cpu+0x60/0x60
Sep 24 20:41:11 log kernel: INFO: task kswapd0:47 blocked for more than 120 seconds.
Sep 24 20:41:11 log kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Sep 24 20:41:11 log kernel: kswapd0         D ffff88023fd93680     0    47      2 0x00000000
Sep 24 20:41:11 log kernel: ffff88023093bbd0 0000000000000046 ffff880230930000 ffff88023093bfd8
Sep 24 20:41:11 log kernel: ffff88023093bfd8 ffff88023093bfd8 ffff880230930000 ffff88022f89ee00
Sep 24 20:41:11 log kernel: 0000000000000000 0000000003c57e3e ffff88022f89eec8 0000000003c5b1da
Sep 24 20:41:11 log kernel: Call Trace:
Sep 24 20:41:11 log kernel: [<ffffffff81609729>] schedule+0x29/0x70
Sep 24 20:41:11 log kernel: [<ffffffffa000dd1d>] taskq_wait_outstanding+0x9d/0x110 [spl]
Sep 24 20:41:11 log kernel: [<ffffffff81098230>] ? wake_up_bit+0x30/0x30
Sep 24 20:41:11 log kernel: [<ffffffffa00b7523>] arc_kmem_reap_now+0xf3/0x100 [zfs]
Sep 24 20:41:11 log kernel: [<ffffffffa00bd40f>] arc_shrinker_func+0xaf/0x170 [zfs]
Sep 24 20:41:11 log kernel: [<ffffffff81169095>] shrink_slab+0x165/0x300
Sep 24 20:41:11 log kernel: [<ffffffff811c0381>] ? vmpressure+0x21/0x90
Sep 24 20:41:11 log kernel: [<ffffffff8116cce1>] balance_pgdat+0x4b1/0x5e0
Sep 24 20:41:11 log kernel: [<ffffffff8116cf83>] kswapd+0x173/0x450
Sep 24 20:41:11 log kernel: [<ffffffff81098230>] ? wake_up_bit+0x30/0x30
Sep 24 20:41:11 log kernel: [<ffffffff8116ce10>] ? balance_pgdat+0x5e0/0x5e0
Sep 24 20:41:11 log kernel: [<ffffffff8109726f>] kthread+0xcf/0xe0
Sep 24 20:41:11 log kernel: [<ffffffff810971a0>] ? kthread_create_on_node+0x140/0x140
Sep 24 20:41:11 log kernel: [<ffffffff816141d8>] ret_from_fork+0x58/0x90
Sep 24 20:41:11 log kernel: [<ffffffff810971a0>] ? kthread_create_on_node+0x140/0x140
Sep 24 20:41:11 log kernel: INFO: task spl_dynamic_tas:114 blocked for more than 120 seconds.
Sep 24 20:41:11 log kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Sep 24 20:41:11 log kernel: spl_dynamic_tas D ffff88023fc93680     0   114      2 0x00000000
Sep 24 20:41:11 log kernel: ffff8802304f7b70 0000000000000046 ffff8802304971c0 ffff8802304f7fd8
Sep 24 20:41:11 log kernel: ffff8802304f7fd8 ffff8802304f7fd8 ffff8802304971c0 ffff8802304f7cd8
Sep 24 20:41:11 log kernel: ffff8802304f7ce0 7fffffffffffffff ffff8802304971c0 ffff8802304f9800
Sep 24 20:41:11 log kernel: Call Trace:
Sep 24 20:41:11 log kernel: [<ffffffff81609729>] schedule+0x29/0x70
Sep 24 20:41:11 log kernel: [<ffffffff81607679>] schedule_timeout+0x209/0x2d0
Sep 24 20:41:11 log kernel: [<ffffffff810ae7a2>] ? select_task_rq_fair+0x552/0x6f0
Sep 24 20:41:11 log kernel: [<ffffffff8104a3c3>] ? x2apic_send_IPI_mask+0x13/0x20
Sep 24 20:41:11 log kernel: [<ffffffff81609c26>] wait_for_completion+0x116/0x170
Sep 24 20:41:11 log kernel: [<ffffffff810a9500>] ? wake_up_state+0x20/0x20
Sep 24 20:41:11 log kernel: [<ffffffffa000d5d0>] ? taskq_thread_spawn+0x60/0x60 [spl]
Sep 24 20:41:11 log kernel: [<ffffffff81097108>] kthread_create_on_node+0xa8/0x140
Sep 24 20:41:11 log kernel: [<ffffffff812df19b>] ? string.isra.6+0x3b/0xf0
Sep 24 20:41:11 log kernel: [<ffffffffa000d5d0>] ? taskq_thread_spawn+0x60/0x60 [spl]
Sep 24 20:41:11 log kernel: [<ffffffffa000d5d0>] ? taskq_thread_spawn+0x60/0x60 [spl]
Sep 24 20:41:11 log kernel: [<ffffffffa000c9dc>] spl_kthread_create+0x9c/0xf0 [spl]
Sep 24 20:41:11 log kernel: [<ffffffffa000dffb>] taskq_thread_create+0x6b/0x110 [spl]
Sep 24 20:41:11 log kernel: [<ffffffffa000e0b2>] taskq_thread_spawn_task+0x12/0x30 [spl]
Sep 24 20:41:11 log kernel: [<ffffffffa000d7ee>] taskq_thread+0x21e/0x420 [spl]
Sep 24 20:41:11 log kernel: [<ffffffff810a9500>] ? wake_up_state+0x20/0x20
Sep 24 20:41:11 log kernel: [<ffffffffa000d5d0>] ? taskq_thread_spawn+0x60/0x60 [spl]
Sep 24 20:41:11 log kernel: [<ffffffff8109726f>] kthread+0xcf/0xe0
Sep 24 20:41:11 log kernel: [<ffffffff810971a0>] ? kthread_create_on_node+0x140/0x140
Sep 24 20:41:11 log kernel: [<ffffffff816141d8>] ret_from_fork+0x58/0x90
Sep 24 20:41:11 log kernel: [<ffffffff810971a0>] ? kthread_create_on_node+0x140/0x140
Sep 24 20:41:11 log kernel: INFO: task arc_reclaim:130 blocked for more than 120 seconds.
Sep 24 20:41:11 log kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Sep 24 20:41:11 log kernel: arc_reclaim     D ffff88023fc13680     0   130      2 0x00000000
Sep 24 20:41:11 log kernel: ffff8800bb803db8 0000000000000046 ffff88022f8e16c0 ffff8800bb803fd8
Sep 24 20:41:11 log kernel: ffff8800bb803fd8 ffff8800bb803fd8 ffff88022f8e16c0 ffff88022f89ee00
Sep 24 20:41:11 log kernel: 0000000000000000 0000000003c57e3e ffff88022f89eec8 0000000003c5b1da
Sep 24 20:41:11 log kernel: Call Trace:
Sep 24 20:41:11 log kernel: [<ffffffff81609729>] schedule+0x29/0x70
Sep 24 20:41:11 log kernel: [<ffffffffa000dd1d>] taskq_wait_outstanding+0x9d/0x110 [spl]
Sep 24 20:41:11 log kernel: [<ffffffff81098230>] ? wake_up_bit+0x30/0x30
Sep 24 20:41:11 log kernel: [<ffffffffa00b7523>] arc_kmem_reap_now+0xf3/0x100 [zfs]
Sep 24 20:41:11 log kernel: [<ffffffffa00bd324>] arc_reclaim_thread+0x1f4/0x230 [zfs]
Sep 24 20:41:11 log kernel: [<ffffffffa00bd130>] ? arc_shrink+0xc0/0xc0 [zfs]
Sep 24 20:41:11 log kernel: [<ffffffffa000c931>] thread_generic_wrapper+0x71/0x80 [spl]
Sep 24 20:41:11 log kernel: [<ffffffffa000c8c0>] ? __thread_exit+0x20/0x20 [spl]
Sep 24 20:41:11 log kernel: [<ffffffff8109726f>] kthread+0xcf/0xe0
Sep 24 20:41:11 log kernel: [<ffffffff810971a0>] ? kthread_create_on_node+0x140/0x140
Sep 24 20:41:11 log kernel: [<ffffffff816141d8>] ret_from_fork+0x58/0x90
Sep 24 20:41:11 log kernel: [<ffffffff810971a0>] ? kthread_create_on_node+0x140/0x140
Sep 24 20:43:11 log kernel: INFO: task kthreadd:2 blocked for more than 120 seconds.
Sep 24 20:43:11 log kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Sep 24 20:43:11 log kernel: kthreadd        D ffff88023fd13680     0     2      0 0x00000000
Sep 24 20:43:11 log kernel: ffff88023417b9d0 0000000000000046 ffff880234160b60 ffff88023417bfd8
Sep 24 20:43:11 log kernel: ffff88023417bfd8 ffff88023417bfd8 ffff880234160b60 ffff88022f89ee00
Sep 24 20:43:11 log kernel: 0000000000000000 0000000003c57e3e ffff88022f89eec8 0000000003c5b1da
Sep 24 20:43:11 log kernel: Call Trace:
Sep 24 20:43:11 log kernel: [<ffffffff81609729>] schedule+0x29/0x70
Sep 24 20:43:11 log kernel: [<ffffffffa000dd1d>] taskq_wait_outstanding+0x9d/0x110 [spl]
Sep 24 20:43:11 log kernel: [<ffffffff81098230>] ? wake_up_bit+0x30/0x30
Sep 24 20:43:11 log kernel: [<ffffffffa00b7523>] arc_kmem_reap_now+0xf3/0x100 [zfs]
Sep 24 20:43:11 log kernel: [<ffffffffa00bd40f>] arc_shrinker_func+0xaf/0x170 [zfs]
Sep 24 20:43:11 log kernel: [<ffffffff81169095>] shrink_slab+0x165/0x300
Sep 24 20:43:11 log kernel: [<ffffffff811c0381>] ? vmpressure+0x21/0x90
Sep 24 20:43:11 log kernel: [<ffffffff8116c1e2>] do_try_to_free_pages+0x3c2/0x4e0
Sep 24 20:43:11 log kernel: [<ffffffff8116c3fc>] try_to_free_pages+0xfc/0x180
Sep 24 20:43:11 log kernel: [<ffffffff81160a7d>] __alloc_pages_nodemask+0x7fd/0xb90
Sep 24 20:43:11 log kernel: [<ffffffff810b1c00>] ? update_curr+0x50/0x150
Sep 24 20:43:11 log kernel: [<ffffffff8106bee4>] copy_process.part.25+0x134/0x14d0
Sep 24 20:43:11 log kernel: [<ffffffff810abe95>] ? sched_clock_cpu+0xb5/0x100
Sep 24 20:43:11 log kernel: [<ffffffff810971a0>] ? kthread_create_on_node+0x140/0x140
Sep 24 20:43:11 log kernel: [<ffffffff8106d43c>] do_fork+0xbc/0x350
Sep 24 20:43:11 log kernel: [<ffffffff8106d6f6>] kernel_thread+0x26/0x30
Sep 24 20:43:11 log kernel: [<ffffffff81097d72>] kthreadd+0x2b2/0x2f0
Sep 24 20:43:11 log kernel: [<ffffffff81097ac0>] ? kthread_create_on_cpu+0x60/0x60
Sep 24 20:43:11 log kernel: [<ffffffff816141d8>] ret_from_fork+0x58/0x90
Sep 24 20:43:11 log kernel: [<ffffffff81097ac0>] ? kthread_create_on_cpu+0x60/0x60
Sep 24 20:43:11 log kernel: INFO: task kswapd0:47 blocked for more than 120 seconds.
Sep 24 20:43:11 log kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Sep 24 20:43:11 log kernel: kswapd0         D ffff88023fd93680     0    47      2 0x00000000
Sep 24 20:43:11 log kernel: ffff88023093bbd0 0000000000000046 ffff880230930000 ffff88023093bfd8
Sep 24 20:43:11 log kernel: ffff88023093bfd8 ffff88023093bfd8 ffff880230930000 ffff88022f89ee00
Sep 24 20:43:11 log kernel: 0000000000000000 0000000003c57e3e ffff88022f89eec8 0000000003c5b1da
Sep 24 20:43:11 log kernel: Call Trace:
Sep 24 20:43:11 log kernel: [<ffffffff81609729>] schedule+0x29/0x70
Sep 24 20:43:11 log kernel: [<ffffffffa000dd1d>] taskq_wait_outstanding+0x9d/0x110 [spl]
Sep 24 20:43:11 log kernel: [<ffffffff81098230>] ? wake_up_bit+0x30/0x30
Sep 24 20:43:11 log kernel: [<ffffffffa00b7523>] arc_kmem_reap_now+0xf3/0x100 [zfs]
Sep 24 20:43:11 log kernel: [<ffffffffa00bd40f>] arc_shrinker_func+0xaf/0x170 [zfs]
Sep 24 20:43:11 log kernel: [<ffffffff81169095>] shrink_slab+0x165/0x300
Sep 24 20:43:11 log kernel: [<ffffffff811c0381>] ? vmpressure+0x21/0x90
Sep 24 20:43:11 log kernel: [<ffffffff8116cce1>] balance_pgdat+0x4b1/0x5e0
Sep 24 20:43:11 log kernel: [<ffffffff8116cf83>] kswapd+0x173/0x450
Sep 24 20:43:11 log kernel: [<ffffffff81098230>] ? wake_up_bit+0x30/0x30
Sep 24 20:43:11 log kernel: [<ffffffff8116ce10>] ? balance_pgdat+0x5e0/0x5e0
Sep 24 20:43:11 log kernel: [<ffffffff8109726f>] kthread+0xcf/0xe0
Sep 24 20:43:11 log kernel: [<ffffffff810971a0>] ? kthread_create_on_node+0x140/0x140
Sep 24 20:43:11 log kernel: [<ffffffff816141d8>] ret_from_fork+0x58/0x90
Sep 24 20:43:11 log kernel: [<ffffffff810971a0>] ? kthread_create_on_node+0x140/0x140
Sep 24 20:43:11 log kernel: INFO: task khugepaged:49 blocked for more than 120 seconds.
Sep 24 20:43:11 log kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Sep 24 20:43:11 log kernel: khugepaged      D ffff88023fd13680     0    49      2 0x00000000
Sep 24 20:43:11 log kernel: ffff88023098b4e8 0000000000000046 ffff8802309316c0 ffff88023098bfd8
Sep 24 20:43:11 log kernel: ffff88023098bfd8 ffff88023098bfd8 ffff8802309316c0 ffffffffa022d6e0
Sep 24 20:43:11 log kernel: ffffffffa022d7a0 ffffffffa022d6e8 ffffffffa022d7c8 0000000000000000
Sep 24 20:43:11 log kernel: Call Trace:
Sep 24 20:43:11 log kernel: [<ffffffff81609729>] schedule+0x29/0x70
Sep 24 20:43:11 log kernel: [<ffffffffa00118d5>] cv_wait_common+0x125/0x150 [spl]
Sep 24 20:43:11 log kernel: [<ffffffff81098230>] ? wake_up_bit+0x30/0x30
Sep 24 20:43:11 log kernel: [<ffffffffa0011915>] __cv_wait+0x15/0x20 [spl]
Sep 24 20:43:11 log kernel: [<ffffffffa00b89c7>] arc_get_data_buf.isra.21+0x3e7/0x3f0 [zfs]
Sep 24 20:43:11 log kernel: [<ffffffffa00bbd3f>] arc_buf_alloc+0x10f/0x190 [zfs]
Sep 24 20:43:11 log kernel: [<ffffffffa00c4e5a>] __dbuf_hold_impl+0x2fa/0x500 [zfs]
Sep 24 20:43:11 log kernel: [<ffffffffa00c50d3>] dbuf_hold_impl+0x73/0xa0 [zfs]
Sep 24 20:43:11 log kernel: [<ffffffffa00c540f>] dbuf_hold+0x2f/0x60 [zfs]
Sep 24 20:43:11 log kernel: [<ffffffffa00cde47>] dmu_buf_hold_noread+0x87/0x140 [zfs]
Sep 24 20:43:11 log kernel: [<ffffffffa00cdf2f>] dmu_buf_hold+0x2f/0x80 [zfs]
Sep 24 20:43:11 log kernel: [<ffffffffa013233c>] zap_get_leaf_byblk+0x5c/0x2c0 [zfs]
Sep 24 20:43:11 log kernel: [<ffffffffa00c2bae>] ? dmu_buf_rele+0xe/0x10 [zfs]
Sep 24 20:43:11 log kernel: [<ffffffffa0131e03>] ? zap_idx_to_blk+0x103/0x180 [zfs]
Sep 24 20:43:11 log kernel: [<ffffffffa013261a>] zap_deref_leaf+0x7a/0xa0 [zfs]
Sep 24 20:43:11 log kernel: [<ffffffffa013376f>] fzap_remove+0x3f/0xb0 [zfs]
Sep 24 20:43:11 log kernel: [<ffffffffa0136f43>] ? zap_name_alloc+0x73/0xd0 [zfs]
Sep 24 20:43:11 log kernel: [<ffffffffa0138e3b>] zap_remove_norm+0x17b/0x1e0 [zfs]
Sep 24 20:43:11 log kernel: [<ffffffffa0138eb3>] zap_remove+0x13/0x20 [zfs]
Sep 24 20:43:11 log kernel: [<ffffffffa0133ea4>] zap_remove_int+0x54/0x80 [zfs]
Sep 24 20:43:11 log kernel: [<ffffffffa01426f4>] zfs_rmnode+0x224/0x350 [zfs]
Sep 24 20:43:11 log kernel: [<ffffffff81607ac2>] ? mutex_lock+0x12/0x2f
Sep 24 20:43:11 log kernel: [<ffffffffa0163d88>] zfs_zinactive+0x168/0x180 [zfs]
Sep 24 20:43:11 log kernel: [<ffffffffa015d4c7>] zfs_inactive+0x67/0x240 [zfs]
Sep 24 20:43:11 log kernel: [<ffffffff81166859>] ? truncate_pagecache+0x59/0x60
Sep 24 20:43:11 log kernel: [<ffffffffa0174ad3>] zpl_evict_inode+0x43/0x60 [zfs]
Sep 24 20:43:11 log kernel: [<ffffffff811e1f87>] evict+0xa7/0x170
Sep 24 20:43:11 log kernel: [<ffffffff811e208e>] dispose_list+0x3e/0x50
Sep 24 20:43:11 log kernel: [<ffffffff811e2f63>] prune_icache_sb+0x163/0x320
Sep 24 20:43:11 log kernel: [<ffffffff811ca0a6>] prune_super+0xd6/0x1a0
Sep 24 20:43:11 log kernel: [<ffffffff81169095>] shrink_slab+0x165/0x300
Sep 24 20:43:11 log kernel: [<ffffffff8116c1e2>] do_try_to_free_pages+0x3c2/0x4e0
Sep 24 20:43:11 log kernel: [<ffffffff8116c3fc>] try_to_free_pages+0xfc/0x180
Sep 24 20:43:11 log kernel: [<ffffffff81160a7d>] __alloc_pages_nodemask+0x7fd/0xb90
Sep 24 20:43:11 log kernel: [<ffffffff811b4129>] khugepaged_scan_mm_slot+0x3c9/0xc60
Sep 24 20:43:11 log kernel: [<ffffffff811b4c17>] khugepaged+0x257/0x480
Sep 24 20:43:11 log kernel: [<ffffffff81098230>] ? wake_up_bit+0x30/0x30
Sep 24 20:43:11 log kernel: [<ffffffff811b49c0>] ? khugepaged_scan_mm_slot+0xc60/0xc60
Sep 24 20:43:11 log kernel: [<ffffffff8109726f>] kthread+0xcf/0xe0
Sep 24 20:43:11 log kernel: [<ffffffff810971a0>] ? kthread_create_on_node+0x140/0x140
Sep 24 20:43:11 log kernel: [<ffffffff816141d8>] ret_from_fork+0x58/0x90
Sep 24 20:43:11 log kernel: [<ffffffff810971a0>] ? kthread_create_on_node+0x140/0x140
Sep 24 20:43:11 log kernel: INFO: task spl_dynamic_tas:114 blocked for more than 120 seconds.
Sep 24 20:43:11 log kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Sep 24 20:43:11 log kernel: spl_dynamic_tas D ffff88023fc93680     0   114      2 0x00000000
Sep 24 20:43:11 log kernel: ffff8802304f7b70 0000000000000046 ffff8802304971c0 ffff8802304f7fd8
Sep 24 20:43:11 log kernel: ffff8802304f7fd8 ffff8802304f7fd8 ffff8802304971c0 ffff8802304f7cd8
Sep 24 20:43:11 log kernel: ffff8802304f7ce0 7fffffffffffffff ffff8802304971c0 ffff8802304f9800
Sep 24 20:43:11 log kernel: Call Trace:
Sep 24 20:43:11 log kernel: [<ffffffff81609729>] schedule+0x29/0x70
Sep 24 20:43:11 log kernel: [<ffffffff81607679>] schedule_timeout+0x209/0x2d0
Sep 24 20:43:11 log kernel: [<ffffffff810ae7a2>] ? select_task_rq_fair+0x552/0x6f0
Sep 24 20:43:11 log kernel: [<ffffffff8104a3c3>] ? x2apic_send_IPI_mask+0x13/0x20
Sep 24 20:43:11 log kernel: [<ffffffff81609c26>] wait_for_completion+0x116/0x170
Sep 24 20:43:11 log kernel: [<ffffffff810a9500>] ? wake_up_state+0x20/0x20
Sep 24 20:43:11 log kernel: [<ffffffffa000d5d0>] ? taskq_thread_spawn+0x60/0x60 [spl]
Sep 24 20:43:11 log kernel: [<ffffffff81097108>] kthread_create_on_node+0xa8/0x140
Sep 24 20:43:11 log kernel: [<ffffffff812df19b>] ? string.isra.6+0x3b/0xf0
Sep 24 20:43:11 log kernel: [<ffffffffa000d5d0>] ? taskq_thread_spawn+0x60/0x60 [spl]
Sep 24 20:43:11 log kernel: [<ffffffffa000d5d0>] ? taskq_thread_spawn+0x60/0x60 [spl]
Sep 24 20:43:11 log kernel: [<ffffffffa000c9dc>] spl_kthread_create+0x9c/0xf0 [spl]
Sep 24 20:43:11 log kernel: [<ffffffffa000dffb>] taskq_thread_create+0x6b/0x110 [spl]
Sep 24 20:43:11 log kernel: [<ffffffffa000e0b2>] taskq_thread_spawn_task+0x12/0x30 [spl]
Sep 24 20:43:11 log kernel: [<ffffffffa000d7ee>] taskq_thread+0x21e/0x420 [spl]
Sep 24 20:43:11 log kernel: [<ffffffff810a9500>] ? wake_up_state+0x20/0x20
Sep 24 20:43:11 log kernel: [<ffffffffa000d5d0>] ? taskq_thread_spawn+0x60/0x60 [spl]
Sep 24 20:43:11 log kernel: [<ffffffff8109726f>] kthread+0xcf/0xe0
Sep 24 20:43:11 log kernel: [<ffffffff810971a0>] ? kthread_create_on_node+0x140/0x140
Sep 24 20:43:11 log kernel: [<ffffffff816141d8>] ret_from_fork+0x58/0x90
Sep 24 20:43:11 log kernel: [<ffffffff810971a0>] ? kthread_create_on_node+0x140/0x140
Sep 24 20:43:11 log kernel: INFO: task arc_reclaim:130 blocked for more than 120 seconds.
Sep 24 20:43:11 log kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Sep 24 20:43:11 log kernel: arc_reclaim     D ffff88023fc13680     0   130      2 0x00000000
Sep 24 20:43:11 log kernel: ffff8800bb803db8 0000000000000046 ffff88022f8e16c0 ffff8800bb803fd8
Sep 24 20:43:11 log kernel: ffff8800bb803fd8 ffff8800bb803fd8 ffff88022f8e16c0 ffff88022f89ee00
Sep 24 20:43:11 log kernel: 0000000000000000 0000000003c57e3e ffff88022f89eec8 0000000003c5b1da
Sep 24 20:43:11 log kernel: Call Trace:
Sep 24 20:43:11 log kernel: [<ffffffff81609729>] schedule+0x29/0x70
Sep 24 20:43:11 log kernel: [<ffffffffa000dd1d>] taskq_wait_outstanding+0x9d/0x110 [spl]
Sep 24 20:43:11 log kernel: [<ffffffff81098230>] ? wake_up_bit+0x30/0x30
Sep 24 20:43:11 log kernel: [<ffffffffa00b7523>] arc_kmem_reap_now+0xf3/0x100 [zfs]
Sep 24 20:43:11 log kernel: [<ffffffffa00bd324>] arc_reclaim_thread+0x1f4/0x230 [zfs]
Sep 24 20:43:11 log kernel: [<ffffffffa00bd130>] ? arc_shrink+0xc0/0xc0 [zfs]
Sep 24 20:43:11 log kernel: [<ffffffffa000c931>] thread_generic_wrapper+0x71/0x80 [spl]
Sep 24 20:43:11 log kernel: [<ffffffffa000c8c0>] ? __thread_exit+0x20/0x20 [spl]
Sep 24 20:43:11 log kernel: [<ffffffff8109726f>] kthread+0xcf/0xe0
Sep 24 20:43:11 log kernel: [<ffffffff810971a0>] ? kthread_create_on_node+0x140/0x140
Sep 24 20:43:11 log kernel: [<ffffffff816141d8>] ret_from_fork+0x58/0x90
Sep 24 20:43:11 log kernel: [<ffffffff810971a0>] ? kthread_create_on_node+0x140/0x140
Sep 24 20:43:11 log kernel: INFO: task z_wr_iss:678 blocked for more than 120 seconds.
Sep 24 20:43:11 log kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Sep 24 20:43:11 log kernel: z_wr_iss        D ffff88023fc13680     0   678      2 0x00000000
Sep 24 20:43:11 log kernel: ffff8800bb30bc80 0000000000000046 ffff88022f75b8e0 ffff8800bb30bfd8
Sep 24 20:43:11 log kernel: ffff8800bb30bfd8 ffff8800bb30bfd8 ffff88022f75b8e0 ffff88008d1079c8
Sep 24 20:43:11 log kernel: ffff88008d1079cc ffff88022f75b8e0 00000000ffffffff ffff88008d1079d0
Sep 24 20:43:11 log kernel: Call Trace:
Sep 24 20:43:11 log kernel: [<ffffffff8160a919>] schedule_preempt_disabled+0x29/0x70
Sep 24 20:43:11 log kernel: [<ffffffff81608665>] __mutex_lock_slowpath+0xc5/0x1c0
Sep 24 20:43:11 log kernel: [<ffffffff81607acf>] mutex_lock+0x1f/0x2f
Sep 24 20:43:11 log kernel: [<ffffffffa00c1f55>] dbuf_write_ready+0x85/0x210 [zfs]
Sep 24 20:43:11 log kernel: [<ffffffffa00b7798>] arc_write_ready+0x28/0xc0 [zfs]
Sep 24 20:43:11 log kernel: [<ffffffffa016e91c>] zio_ready+0x6c/0x470 [zfs]
Sep 24 20:43:11 log kernel: [<ffffffffa000cc82>] ? taskq_member+0x62/0x70 [spl]
Sep 24 20:43:11 log kernel: [<ffffffffa016ae98>] zio_execute+0xc8/0x180 [zfs]
Sep 24 20:43:11 log kernel: [<ffffffffa000d7ee>] taskq_thread+0x21e/0x420 [spl]
Sep 24 20:43:11 log kernel: [<ffffffff810a9500>] ? wake_up_state+0x20/0x20
Sep 24 20:43:11 log kernel: [<ffffffffa000d5d0>] ? taskq_thread_spawn+0x60/0x60 [spl]
Sep 24 20:43:11 log kernel: [<ffffffff8109726f>] kthread+0xcf/0xe0
Sep 24 20:43:11 log kernel: [<ffffffff810971a0>] ? kthread_create_on_node+0x140/0x140
Sep 24 20:43:11 log kernel: [<ffffffff816141d8>] ret_from_fork+0x58/0x90
Sep 24 20:43:11 log kernel: [<ffffffff810971a0>] ? kthread_create_on_node+0x140/0x140
@behlendorf
Copy link
Contributor

This is a duplicate of #3808 and there is fix in #3826 which will appear in the next point release.

@behlendorf behlendorf added this to the 0.6.5.2 milestone Sep 25, 2015
@odoucet
Copy link

odoucet commented Sep 25, 2015

@behlendorf you linked this issue, not the fix that will appear in the next point release ;)

@behlendorf
Copy link
Contributor

@odoucet whoops! Fixed, sorry I was off by a digit. Pull request #3826 contains the fix.

@behlendorf
Copy link
Contributor

Closing as duplicate of #3808.

behlendorf added a commit that referenced this issue Sep 25, 2015
As described in the comment above arc_reclaim_thread() it's critical
that the reclaim thread be careful about blocking.  Just like it must
never wait on a hash lock, it must never wait on a task which can in
turn wait on the CV in arc_get_data_buf().  This will deadlock, see
issue #3822 for full backtraces showing the problem.

To resolve this issue arc_kmem_reap_now() has been updated to use the
asynchronous arc prune function.  This means that arc_prune_async()
may now be called while there are still outstanding arc_prune_tasks.
However, this isn't a problem because arc_prune_async() already
keeps a reference count preventing multiple outstanding tasks per
registered consumer.  Functionally, this behavior is the same as
the counterpart illumos function dnlc_reduce_cache().

Signed-off-by: Brian Behlendorf <[email protected]>
Signed-off-by: Tim Chase <[email protected]>
Issue #3808
Issue #3834
Issue #3822
behlendorf added a commit that referenced this issue Sep 30, 2015
As described in the comment above arc_reclaim_thread() it's critical
that the reclaim thread be careful about blocking.  Just like it must
never wait on a hash lock, it must never wait on a task which can in
turn wait on the CV in arc_get_data_buf().  This will deadlock, see
issue #3822 for full backtraces showing the problem.

To resolve this issue arc_kmem_reap_now() has been updated to use the
asynchronous arc prune function.  This means that arc_prune_async()
may now be called while there are still outstanding arc_prune_tasks.
However, this isn't a problem because arc_prune_async() already
keeps a reference count preventing multiple outstanding tasks per
registered consumer.  Functionally, this behavior is the same as
the counterpart illumos function dnlc_reduce_cache().

Signed-off-by: Brian Behlendorf <[email protected]>
Signed-off-by: Tim Chase <[email protected]>
Issue #3808
Issue #3834
Issue #3822
MorpheusTeam pushed a commit to Xyratex/lustre-stable that referenced this issue Oct 17, 2015
ZFS/SPL 0.6.5.2

Bug Fixes
* Init script fixes openzfs/zfs#3816
* Fix uioskip crash when skip to end openzfs/zfs#3806
  openzfs/zfs#3850
* Userspace can trigger an assertion openzfs/zfs#3792
* Fix quota userused underflow bug openzfs/zfs#3789
* Fix performance regression from unwanted synchronous I/O
  openzfs/zfs#3780
* Fix deadlock during ARC reclaim openzfs/zfs#3808
  openzfs/zfs#3834
* Fix deadlock with zfs receive and clamscan openzfs/zfs#3719
* Allow NFS activity to defer snapshot unmounts openzfs/zfs#3794
* Linux 4.3 compatibility openzfs/zfs#3799
* Zed reload fixes openzfs/zfs#3773
* Fix PAX Patch/Grsec SLAB_USERCOPY panic openzfs/zfs#3796
* Always remove during dkms uninstall/update openzfs/spl#476

ZFS/SPL 0.6.5.1

Bug Fixes

* Fix zvol corruption with TRIM/discard openzfs/zfs#3798
* Fix NULL as mount(2) syscall data parameter openzfs/zfs#3804
* Fix xattr=sa dataset property not honored openzfs/zfs#3787

ZFS/SPL 0.6.5

Supported Kernels

* Compatible with 2.6.32 - 4.2 Linux kernels.

New Functionality

* Support for temporary mount options.
* Support for accessing the .zfs/snapshot over NFS.
* Support for estimating send stream size when source is a bookmark.
* Administrative commands are allowed to use reserved space improving
  robustness.
* New notify ZEDLETs support email and pushbullet notifications.
* New keyword 'slot' for vdev_id.conf to control what is use for the
  slot number.
* New zpool export -a option unmounts and exports all imported pools.
* New zpool iostat -y omits the first report with statistics since
  boot.
* New zdb can now open the root dataset.
* New zdb can print the numbers of ganged blocks.
* New zdb -ddddd can print details of block pointer objects.
* New zdb -b performance improved.
* New zstreamdump -d prints contents of blocks.

New Feature Flags

* large_blocks - This feature allows the record size on a dataset to
be set larger than 128KB. We currently support block sizes from 512
bytes to 16MB. The benefits of larger blocks, and thus larger IO, need
to be weighed against the cost of COWing a giant block to modify one
byte. Additionally, very large blocks can have an impact on I/O
latency, and also potentially on the memory allocator. Therefore, we
do not allow the record size to be set larger than zfs_max_recordsize
(default 1MB). Larger blocks can be created by changing this tuning,
pools with larger blocks can always be imported and used, regardless
of this setting.

* filesystem_limits - This feature enables filesystem and snapshot
limits. These limits can be used to control how many filesystems
and/or snapshots can be created at the point in the tree on which the
limits are set.

*Performance*

* Improved zvol performance on all kernels (>50% higher throughput,
  >20% lower latency)
* Improved zil performance on Linux 2.6.39 and earlier kernels (10x
  lower latency)
* Improved allocation behavior on mostly full SSD/file pools (5% to
  10% improvement on 90% full pools)
* Improved performance when removing large files.
* Caching improvements (ARC):
** Better cached read performance due to reduced lock contention.
** Smarter heuristics for managing the total size of the cache and the
   distribution of data/metadata.
** Faster release of cached buffers due to unexpected memory pressure.

*Changes in Behavior*

* Default reserved space was increased from 1.6% to 3.3% of total pool
capacity. This default percentage can be controlled through the new
spa_slop_shift module option, setting it to 6 will restore the
previous percentage.

* Loading of the ZFS module stack is now handled by systemd or the
sysv init scripts. Invoking the zfs/zpool commands will not cause the
modules to be automatically loaded. The previous behavior can be
restored by setting the ZFS_MODULE_LOADING=yes environment variable
but this functionality will be removed in a future release.

* Unified SYSV and Gentoo OpenRC initialization scripts. The previous
functionality has been split in to zfs-import, zfs-mount, zfs-share,
and zfs-zed scripts. This allows for independent control of the
services and is consistent with the unit files provided for a systemd
based system. Complete details of the functionality provided by the
updated scripts can be found here.

* Task queues are now dynamic and worker threads will be created and
destroyed as needed. This allows the system to automatically tune
itself to ensure the optimal number of threads are used for the active
workload which can result in a performance improvement.

* Task queue thread priorities were correctly aligned with the default
Linux file system thread priorities. This allows ZFS to compete fairly
with other active Linux file systems when the system is under heavy
load.

* When compression=on the default compression algorithm will be lz4 as
long as the feature is enabled. Otherwise the default remains lzjb.
Similarly lz4 is now the preferred method for compressing meta data
when available.

* The use of mkdir/rmdir/mv in the .zfs/snapshot directory has been
disabled by default both locally and via NFS clients. The
zfs_admin_snapshot module option can be used to re-enable this
functionality.

* LBA weighting is automatically disabled on files and SSDs ensuring
the entire device is used fairly.
* iostat accounting on zvols running on kernels older than Linux 3.19
is no longer supported.

* The known issues preventing swap on zvols for Linux 3.9 and newer
kernels have been resolved. However, deadlocks are still possible for
older kernels.

Module Options

* Changed zfs_arc_c_min default from 4M to 32M to accommodate large
  blocks.
* Added metaslab_aliquot to control how many bytes are written to a
  top-level vdev before moving on to the next one. Increasing this may
  be helpful when using blocks larger than 1M.
* Added spa_slop_shift, see 'reserved space' comment in the 'changes
  to behavior' section.
* Added zfs_admin_snapshot, enable/disable the use of mkdir/rmdir/mv
  in .zfs/snapshot directory.
* Added zfs_arc_lotsfree_percent, throttle I/O when free system
  memory drops below this percentage.
* Added zfs_arc_num_sublists_per_state, used to allow more
  fine-grained locking.
* Added zfs_arc_p_min_shift, used to set a floor on arc_p.
* Added zfs_arc_sys_free, the target number of bytes the ARC should
  leave as free.
* Added zfs_dbgmsg_enable, used to enable the 'dbgmsg' kstat.
* Added zfs_dbgmsg_maxsize, sets the maximum size of the dbgmsg
  buffer.
* Added zfs_max_recordsize, used to control the maximum allowed
  record size.
* Added zfs_arc_meta_strategy, used to select the preferred ARC
  reclaim strategy.
* Removed metaslab_min_alloc_size, it was unused internally due to
  prior changes.
* Removed zfs_arc_memory_throttle_disable, replaced by
  zfs_arc_lotsfree_percent.
* Removed zvol_threads, zvols no longer require a dedicated task
  queue.
* See zfs-module-parameters(5) for complete details on available
  module options.

Bug Fixes

* Improved documentation with many updates, corrections, and
  additions.
* Improved sysv, systemd, initramfs, and dracut support.
* Improved block pointer validation before issuing IO.
* Improved scrub pause heuristics.
* Improved test coverage.
* Improved heuristics for automatic repair when zfs_recover=1 module
  option is set.
* Improved debugging infrastructure via 'dbgmsg' kstat.
* Improved zpool import performance.
* Fixed deadlocks in direct memory reclaim.
* Fixed deadlock on db_mtx and dn_holds.
* Fixed deadlock in dmu_objset_find_dp().
* Fixed deadlock during zfs rollback.
* Fixed kernel panic due to tsd_exit() in ZFS_EXIT.
* Fixed kernel panic when adding a duplicate dbuf to dn_dbufs.
* Fixed kernel panic due to security / ACL creation failure.
* Fixed kernel panic on unmount due to iput taskq.
* Fixed panic due to corrupt nvlist when running utilities.
* Fixed panic on unmount due to not waiting for all znodes to be
  released.
* Fixed panic with zfs clone from different source and target pools.
* Fixed NULL pointer dereference in dsl_prop_get_ds().
* Fixed NULL pointer dereference in dsl_prop_notify_all_cb().
* Fixed NULL pointer dereference in zfsdev_getminor().
* Fixed I/Os are now aggregated across ZIO priority classes.
* Fixed .zfs/snapshot auto-mounting for all supported kernels.
* Fixed 3-digit octal escapes by changing to 4-digit which
  disambiguate the output.
* Fixed hard lockup due to infinite loop in zfs_zget().
* Fixed misreported 'alloc' value for cache devices.
* Fixed spurious hung task watchdog stack traces.
* Fixed direct memory reclaim deadlocks.
* Fixed module loading in zfs import systemd service.
* Fixed intermittent libzfs_init() failure to open /dev/zfs.
* Fixed hot-disk sparing for disk vdevs
* Fixed system spinning during ARC reclaim.
* Fixed formatting errors in {{zfs(8)}}
* Fixed zio pipeline stall by having callers invoke next stage.
* Fixed assertion failed in zrl_tryenter().
* Fixed memory leak in make_root_vdev().
* Fixed memory leak in zpool_in_use().
* Fixed memory leak in libzfs when doing rollback.
* Fixed hold leak in dmu_recv_end_check().
* Fixed refcount leak in bpobj_iterate_impl().
* Fixed misuse of input argument in traverse_visitbp().
* Fixed missing missing mutex_destroy() calls.
* Fixed integer overflows in dmu_read/dmu_write.
* Fixed verify() failure in zio_done().
* Fixed zio_checksum_error() to only include info for ECKSUM errors.
* Fixed -ESTALE to force lookup on missing NFS file handles.
* Fixed spurious failures from dsl_dataset_hold_obj().
* Fixed zfs compressratio when using with 4k sector size.
* Fixed spurious watchdog warnings in prefetch thread.
* Fixed unfair disk space allocation when vdevs are of unequal size.
* Fixed ashift accounting error writing to cache devices.
* Fixed zdb -d has false positive warning when
  feature@large_blocks=disabled.
* Fixed zdb -h | -i seg fault.
* Fixed force-received full stream into a dataset if it has a
  snapshot.
* Fixed snapshot error handling.
* Fixed 'hangs' while deleting large files.
* Fixed lock contention (rrw_exit) while running a read only load.
* Fixed error message when creating a pool to include all problematic
  devices.
* Fixed Xen virtual block device detection, partitions are now
  created.
* Fixed missing E2BIG error handling in zfs_setprop_error().
* Fixed zpool import assertion in libzfs_import.c.
* Fixed zfs send -nv output to stderr.
* Fixed idle pool potentially running itself out of space.
* Fixed narrow race which allowed read(2) to access beyond fstat(2)'s
  reported end-of-file.
* Fixed support for VPATH builds.
* Fixed double counting of HDR_L2ONLY_SIZE in ARC.
* Fixed 'BUG: Bad page state' warning from kernel due to writeback
  flag.
* Fixed arc_available_memory() to check freemem.
* Fixed arc_memory_throttle() to check pageout.
* Fixed'zpool create warning when using zvols in debug builds.
* Fixed loop devices layered on ZFS with 4.1 kernels.
* Fixed zvol contribution to kernel entropy pool.
* Fixed handling of compression flags in arc header.
* Substantial changes to realign code base with illumos.
* Many additional bug fixes.

Signed-off-by: Nathaniel Clark <[email protected]>
Change-Id: I87c012aec9ec581b10a417d699dafc7d415abf63
Reviewed-on: http://review.whamcloud.com/16399
Tested-by: Jenkins
Reviewed-by: Alex Zhuravlev <[email protected]>
Tested-by: Maloo <[email protected]>
Reviewed-by: Andreas Dilger <[email protected]>
@rfehren
Copy link

rfehren commented Oct 19, 2015

It seems, this problem is still not completely fixed. With kernel 3.12.x and 0.65.2 My nodes still hang reproducably with swap on a zvol. Very simple test case: Run something creating permanent disk I/O like e.g. a fio job and then start a Linpack job in parallel that is active on all cores and consumes all RAM. After some minutes you will see the familiar hung tasks messages and a little later a total dead-lock.

[ 8292.590906] INFO: task kswapd0:145 blocked for more than 120 seconds.
[ 8292.597367]       Tainted: P           O 3.12.48-ql-generic-9.1-61 #1
[ 8292.603825] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 8292.611812] INFO: task kswapd1:146 blocked for more than 120 seconds.
[ 8292.618264]       Tainted: P           O 3.12.48-ql-generic-9.1-61 #1
[ 8292.624719] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 8292.632710] INFO: task rpcbind:1855 blocked for more than 120 seconds.
[ 8292.639247]       Tainted: P           O 3.12.48-ql-generic-9.1-61 #1
[ 8292.645701] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 8292.653650] INFO: task arc_reclaim:2899 blocked for more than 120 seconds.
[ 8292.660537]       Tainted: P           O 3.12.48-ql-generic-9.1-61 #1
[ 8292.666992] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 8292.675546] INFO: task txg_sync:3753 blocked for more than 120 seconds.
[ 8292.682456]       Tainted: P           O 3.12.48-ql-generic-9.1-61 #1
[ 8292.689184] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants