Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

soft lockup errors writing to zvol #922

Closed
cwedgwood opened this issue Sep 1, 2012 · 49 comments
Closed

soft lockup errors writing to zvol #922

cwedgwood opened this issue Sep 1, 2012 · 49 comments
Labels
Component: Memory Management kernel memory management Component: ZVOL ZFS Volumes Type: Documentation Indicates a requested change to the documentation
Milestone

Comments

@cwedgwood
Copy link
Contributor

create a zvol, dd data into it and fairly quickly we get:

[ 622.083345] INFO: task flush-230:0:4232 blocked for more than 20 seconds.
[ 622.090558] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 622.098746] flush-230:0 D ffff8803318503a0 4224 4232 2 0x00000000
[ 622.106315] ffff88060bda37d0 0000000000000046 ffff880331850000 ffff88060bda3fd8
[ 622.114453] ffff88060bda3fd8 0000000000012780 ffffffff817633f0 ffff880331850000
[ 622.122625] ffff88060bda37a0 ffff880333c12780 ffff880331850000 ffff8802c88aec00
[ 622.136105] Call Trace:
[ 622.138728] [] schedule+0x65/0x67
[ 622.144026] [] io_schedule+0x60/0x7a
[ 622.149601] [] get_request_wait+0xbd/0x166
[ 622.155723] [] ? cfq_merge+0x72/0xa1
[ 622.161295] [] ? abort_exclusive_wait+0x8f/0x8f
[ 622.167828] [] blk_queue_bio+0x193/0x2d6
[ 622.173722] [] generic_make_request+0x9c/0xdd
[ 622.179999] [] submit_bio+0xbb/0xd4
[ 622.185405] [] ? inc_zone_page_state+0x27/0x29
[ 622.191903] [] submit_bh+0xf6/0x116
[ 622.197283] [] __block_write_full_page+0x200/0x2fd
[ 622.204185] [] ? blkdev_get_blocks+0x93/0x93
[ 622.210460] [] ? drop_buffers+0x96/0x96
[ 622.216355] [] ? blkdev_get_blocks+0x93/0x93
[ 622.222542] [] ? drop_buffers+0x96/0x96
[ 622.228452] [] block_write_full_page_endio+0x89/0x95
[ 622.235593] [] block_write_full_page+0x15/0x17
[ 622.242034] [] blkdev_writepage+0x18/0x1a
[ 622.248060] [] __writepage+0x14/0x2d
[ 622.253732] [] ? page_index+0x1a/0x1a
[ 622.259396] [] write_cache_pages+0x22e/0x366
[ 622.265710] [] ? page_index+0x1a/0x1a
[ 622.271408] [] generic_writepages+0x3e/0x58
[ 622.277606] [] do_writepages+0x1e/0x2b
[ 622.283350] [] __writeback_single_inode.isra.31+0x4c/0x123
[ 622.290891] [] writeback_sb_inodes+0x1d3/0x310
[ 622.297388] [] __writeback_inodes_wb+0x74/0xb9
[ 622.303833] [] wb_writeback+0x136/0x26c
[ 622.309663] [] ? global_dirty_limits+0x2f/0x10e
[ 622.316218] [] wb_do_writeback+0x185/0x1bb
[ 622.322283] [] bdi_writeback_thread+0xa5/0x1ce
[ 622.328859] [] ? wb_do_writeback+0x1bb/0x1bb
[ 622.335219] [] kthread+0x8b/0x93
[ 622.340398] [] kernel_thread_helper+0x4/0x10
[ 622.346784] [] ? retint_restore_args+0x13/0x13
[ 622.353261] [] ? kthread_worker_fn+0x149/0x149
[ 622.359759] [] ? gs_change+0x13/0x13
[ 622.365390] INFO: task blkid:4280 blocked for more than 20 seconds.
[ 622.372076] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 622.380445] blkid D ffff8806313f03a0 5720 4280 4221 0x00000000
[ 622.387989] ffff8804dcd37ac8 0000000000000082 ffff8806313f0000 ffff8804dcd37fd8
[ 622.396128] ffff8804dcd37fd8 0000000000012780 ffff880630e38000 ffff8806313f0000
[ 622.404338] ffffffff810608f2 ffff88062f038918 ffff8806313f0000 ffff88062f03891c
[ 622.412436] Call Trace:
[ 622.415116] [] ? need_resched+0x11/0x1d
[ 622.420913] [] schedule+0x65/0x67
[ 622.426238] [] schedule_preempt_disabled+0xe/0x10
[ 622.433118] [] __mutex_lock_common.isra.7+0x14a/0x166
[ 622.440210] [] __mutex_lock_slowpath+0x13/0x15
[ 622.446826] [] mutex_lock+0x18/0x29
[ 622.452350] [] __blkdev_get+0x9c/0x3da
[ 622.458013] [] ? blkdev_get+0x2ce/0x2ce
[ 622.463767] [] blkdev_get+0x189/0x2ce
[ 622.469289] [] ? find_get_page+0x4a/0x6a
[ 622.475095] [] ? __d_lookup_rcu+0xa2/0xc9
[ 622.480979] [] ? blkdev_get+0x2ce/0x2ce
[ 622.486742] [] blkdev_open+0x64/0x70
[ 622.492188] [] do_dentry_open.isra.17+0x16e/0x21d
[ 622.498880] [] nameidata_to_filp+0x42/0x84
[ 622.504952] [] do_last.isra.47+0x625/0x64b
[ 622.510996] [] path_openat+0xc5/0x2f4
[ 622.516569] [] do_filp_open+0x38/0x86
[ 622.522157] [] ? getname_flags+0x2a/0xa2
[ 622.528120] [] ? alloc_fd+0xe5/0xf7
[ 622.533606] [] do_sys_open+0x6e/0x102
[ 622.539253] [] sys_open+0x21/0x23
[ 622.544665] [] system_call_fastpath+0x16/0x1b
[ 642.511953] INFO: task flush-230:0:4232 blocked for more than 20 seconds.
[ 642.519226] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 642.527644] flush-230:0 D ffff8803318503a0 4224 4232 2 0x00000000
[ 642.535269] ffff88060bda37d0 0000000000000046 ffff880331850000 ffff88060bda3fd8
[ 642.543417] ffff88060bda3fd8 0000000000012780 ffffffff817633f0 ffff880331850000
[ 642.551527] ffff88060bda37a0 ffff880333c12780 ffff880331850000 ffff8802c88aec00
[ 642.559693] Call Trace:
[ 642.562439] [] schedule+0x65/0x67
[ 642.567711] [] io_schedule+0x60/0x7a
[ 642.573354] [] get_request_wait+0xbd/0x166
[ 642.579492] [] ? cfq_merge+0x72/0xa1
[ 642.585158] [] ? abort_exclusive_wait+0x8f/0x8f
[ 642.591849] [] blk_queue_bio+0x193/0x2d6
[ 642.597774] [] generic_make_request+0x9c/0xdd
[ 642.604279] [] submit_bio+0xbb/0xd4
[ 642.609739] [] ? inc_zone_page_state+0x27/0x29
[ 642.616262] [] submit_bh+0xf6/0x116
[ 642.621806] [] __block_write_full_page+0x200/0x2fd
[ 642.628676] [] ? blkdev_get_blocks+0x93/0x93
[ 642.635033] [] ? drop_buffers+0x96/0x96
[ 642.640850] [] ? blkdev_get_blocks+0x93/0x93
[ 642.647224] [] ? drop_buffers+0x96/0x96
[ 642.653218] [] block_write_full_page_endio+0x89/0x95
[ 642.660238] [] block_write_full_page+0x15/0x17
[ 642.666827] [] blkdev_writepage+0x18/0x1a
[ 642.672929] [] __writepage+0x14/0x2d
[ 642.678471] [] ? page_index+0x1a/0x1a
[ 642.684215] [] write_cache_pages+0x22e/0x366
[ 642.690480] [] ? page_index+0x1a/0x1a
[ 642.696224] [] generic_writepages+0x3e/0x58
[ 642.702510] [] do_writepages+0x1e/0x2b
[ 642.708236] [] __writeback_single_inode.isra.31+0x4c/0x123
[ 642.715938] [] writeback_sb_inodes+0x1d3/0x310
[ 642.722505] [] __writeback_inodes_wb+0x74/0xb9
[ 642.728979] [] wb_writeback+0x136/0x26c
[ 642.734905] [] ? global_dirty_limits+0x2f/0x10e
[ 642.741572] [] wb_do_writeback+0x185/0x1bb
[ 642.747656] [] bdi_writeback_thread+0xa5/0x1ce
[ 642.754191] [] ? wb_do_writeback+0x1bb/0x1bb
[ 642.760487] [] kthread+0x8b/0x93
[ 642.765725] [] kernel_thread_helper+0x4/0x10
[ 642.772058] [] ? retint_restore_args+0x13/0x13
[ 642.778559] [] ? kthread_worker_fn+0x149/0x149
[ 642.784956] [] ? gs_change+0x13/0x13
[ 642.790429] INFO: task blkid:4280 blocked for more than 20 seconds.
[ 642.796923] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 642.805005] blkid D ffff8806313f03a0 5720 4280 4221 0x00000000
[ 642.812464] ffff8804dcd37ac8 0000000000000082 ffff8806313f0000 ffff8804dcd37fd8
[ 642.820356] ffff8804dcd37fd8 0000000000012780 ffff880630e38000 ffff8806313f0000
[ 642.828250] ffffffff810608f2 ffff88062f038918 ffff8806313f0000 ffff88062f03891c
[ 642.836148] Call Trace:
[ 642.838712] [] ? need_resched+0x11/0x1d
[ 642.844313] [] schedule+0x65/0x67
[ 642.849405] [] schedule_preempt_disabled+0xe/0x10
[ 642.855865] [] __mutex_lock_common.isra.7+0x14a/0x166
[ 642.862884] [] __mutex_lock_slowpath+0x13/0x15
[ 642.869304] [] mutex_lock+0x18/0x29
[ 642.874805] [] __blkdev_get+0x9c/0x3da
[ 642.880574] [] ? blkdev_get+0x2ce/0x2ce
[ 642.886406] [] blkdev_get+0x189/0x2ce
[ 642.892103] [] ? find_get_page+0x4a/0x6a
[ 642.898052] [] ? __d_lookup_rcu+0xa2/0xc9
[ 642.904063] [] ? blkdev_get+0x2ce/0x2ce
[ 642.909908] [] blkdev_open+0x64/0x70
[ 642.915489] [] do_dentry_open.isra.17+0x16e/0x21d
[ 642.922308] [] nameidata_to_filp+0x42/0x84
[ 642.928422] [] do_last.isra.47+0x625/0x64b
[ 642.934490] [] path_openat+0xc5/0x2f4
[ 642.940187] [] do_filp_open+0x38/0x86
[ 642.945894] [] ? getname_flags+0x2a/0xa2
[ 642.951825] [] ? alloc_fd+0xe5/0xf7
[ 642.957334] [] do_sys_open+0x6e/0x102
[ 642.962969] [] sys_open+0x21/0x23
[ 642.968315] [] system_call_fastpath+0x16/0x1b

@cwedgwood
Copy link
Contributor Author

This might be more meaningfull:

[ 208.481594] BUG: soft lockup - CPU#9 stuck for 23s! [z_wr_iss/9:2337]
[ 208.488405] Modules linked in: zfs(O)[ 208.491582] BUG: soft lockup - CPU#10 stuck for 23s! [z_wr_iss/10:2338]
[ 208.491583] Modules linked in: zfs(O) zunicode(O) zavl(O) zcommon(O) znvpair(O) spl(O) sg coretemp kvm_intel i2c_i801 lpc_ich mfd_core kvm i2c_core mpt2sas ghash_clmulni_intel microcode i7core_edac
[ 208.491596] CPU 10 Modules linked in: zfs(O) zunicode(O) zavl(O) zcommon(O) znvpair(O) spl(O) sg coretemp kvm_intel i2c_i801 lpc_ich mfd_core kvm i2c_core mpt2sas ghash_clmulni_intel microcode i7core_edac
[ 208.491605]
[ 208.491607] Pid: 2338, comm: z_wr_iss/10 Tainted: G O 3.5.3-cw3 #5 Supermicro X8DTL/X8DTL
[ 208.491610] RIP: 0010:[] [] spl_kmem_cache_alloc+0x19a/0xa9d [spl]
[ 208.491624] RSP: 0018:ffff88032e6dbca0 EFLAGS: 00000246
[ 208.491625] RAX: 0000000000000000 RBX: ffffffff814ec85d RCX: ffff88062ede21f8
[ 208.491626] RDX: ffff88062ede21e8 RSI: 0000000000000030 RDI: ffff88062ede2218
[ 208.491627] RBP: ffff88032e6dbd80 R08: 0000000000000000 R09: ffff880330981db8
[ 208.491628] R10: 0000000000000000 R11: ffff880330981d00 R12: ffff88032e6dbc18
[ 208.491629] R13: 0000000000000046 R14: ffff88032e6da000 R15: 0000000000000000
[ 208.491630] FS: 0000000000000000(0000) GS:ffff880333cc0000(0000) knlGS:0000000000000000
[ 208.491631] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 208.491633] CR2: 00007f43f7fc2e02 CR3: 000000000175b000 CR4: 00000000000007e0
[ 208.491634] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 208.491635] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 208.491636] Process z_wr_iss/10 (pid: 2338, threadinfo ffff88032e6da000, task ffff88032e6d43e0)
[ 208.491636] Stack:
[ 208.491637] ffff88032e6d43e0 ffff88009d08eec0 0000000000010000 ffff880630c16800
[ 208.491641] 0000000000000000 0000000000000001 ffff88032e6dbd10 ffff88032e6d43e0
[ 208.491645] ffff88062ede21f8 ffff88032e6dbd40 0000000000000202 ffff88062ede2160
[ 208.491648] Call Trace:
[ 208.491650] [] ? vdev_config_sync+0x137/0x137 [zfs]
[ 208.491696] [] ? spa_config_enter+0xc4/0xde [zfs]
[ 208.491720] [] zio_buf_alloc+0x22/0x24 [zfs]
[ 208.491753] [] zio_write_bp_init+0x19b/0x3da [zfs]
[ 208.491783] [] zio_execute+0xb8/0xe0 [zfs]
[ 208.491812] [] taskq_thread+0x2aa/0x4d2 [spl]
[ 208.491819] [] ? finish_task_switch+0x82/0xad
[ 208.491823] [] ? try_to_wake_up+0x1d4/0x1d4
[ 208.491827] [] ? task_done+0x110/0x110 [spl]
[ 208.491833] [] kthread+0x8b/0x93
[ 208.491836] [] kernel_thread_helper+0x4/0x10
[ 208.491841] [] ? retint_restore_args+0x13/0x13
[ 208.491845] [] ? kthread_worker_fn+0x149/0x149
[ 208.491847] [] ? gs_change+0x13/0x13
[ 208.491849] Code: 00 4c 8b ab f8 01 00 00 4c 3b ad 60 ff ff ff 49 8d 55 f0 0f 85 2e 07 00 00 48 8b 7d 90 e8 a3 d7 ff ff e8 52 3a 71 e0 fb 66 66 90 <66> 66 90 f6 05 58 e3 00 00 01 74 38 f6 05 58 e3 00 00 08 74 2f
[ 208.511549] BUG: soft lockup - CPU#11 stuck for 23s! [z_wr_iss/11:2339]
[ 208.511550] Modules linked in: zfs(O) zunicode(O) zavl(O) zcommon(O) znvpair(O) spl(O) sg coretemp kvm_intel i2c_i801 lpc_ich mfd_core kvm i2c_core mpt2sas ghash_clmulni_intel microcode i7core_edac
[ 208.511561] CPU 11 Modules linked in: zfs(O) zunicode(O) zavl(O) zcommon(O) znvpair(O) spl(O) sg coretemp kvm_intel i2c_i801 lpc_ich mfd_core kvm i2c_core mpt2sas ghash_clmulni_intel microcode i7core_edac
[ 208.511571]
[ 208.511573] Pid: 2339, comm: z_wr_iss/11 Tainted: G O 3.5.3-cw3 #5 Supermicro X8DTL/X8DTL
[ 208.511575] RIP: 0010:[] [] spl_kmem_cache_alloc+0x19a/0xa9d [spl]
[ 208.511583] RSP: 0018:ffff88032e6ddca0 EFLAGS: 00000246
[ 208.511584] RAX: 0000000000000000 RBX: ffffffff814ec85d RCX: ffff88062ede21f8
[ 208.511585] RDX: ffff88062ede21e8 RSI: 0000000000000030 RDI: ffff88062ede2218
[ 208.511586] RBP: ffff88032e6ddd80 R08: 0000000000000000 R09: ffff880330981db8
[ 208.511587] R10: 0000000000000000 R11: ffff880330981d00 R12: ffff88032e6ddc18
[ 208.511588] R13: 0000000000000046 R14: ffff88032e6dc000 R15: 0000000000000000
[ 208.511589] FS: 0000000000000000(0000) GS:ffff880333ce0000(0000) knlGS:0000000000000000
[ 208.511590] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 208.511592] CR2: 00007f43f7fc2e02 CR3: 000000000175b000 CR4: 00000000000007e0
[ 208.511593] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 208.511593] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 208.511595] Process z_wr_iss/11 (pid: 2339, threadinfo ffff88032e6dc000, task ffff88032e6d5a80)
[ 208.511595] Stack:
[ 208.511596] ffff88032e6d5a80 ffff880540f46320 0000000000010000 ffff880630c16800
[ 208.511600] 0000000000000000 0000000000000001 ffff88032e6ddd10 ffff88032e6d5a80
[ 208.511604] ffff88062ede21f8 ffff88032e6ddd40 0000000000000206 ffff88062ede2160
[ 208.511607] Call Trace:
[ 208.511609] [] ? vdev_config_sync+0x137/0x137 [zfs]
[ 208.511635] [] ? spa_config_enter+0xc4/0xde [zfs]
[ 208.511660] [] zio_buf_alloc+0x22/0x24 [zfs]
[ 208.511689] [] zio_write_bp_init+0x19b/0x3da [zfs]
[ 208.511719] [] zio_execute+0xb8/0xe0 [zfs]
[ 208.511748] [] taskq_thread+0x2aa/0x4d2 [spl]
[ 208.511754] [] ? finish_task_switch+0x82/0xad
[ 208.511757] [] ? try_to_wake_up+0x1d4/0x1d4
[ 208.511760] [] ? task_done+0x110/0x110 [spl]
[ 208.511766] [] kthread+0x8b/0x93
[ 208.511769] [] kernel_thread_helper+0x4/0x10
[ 208.511772] [] ? retint_restore_args+0x13/0x13
[ 208.511775] [] ? kthread_worker_fn+0x149/0x149
[ 208.511777] [] ? gs_change+0x13/0x13
[ 208.511779] Code: 00 4c 8b ab f8 01 00 00 4c 3b ad 60 ff ff ff 49 8d 55 f0 0f 85 2e 07 00 00 48 8b 7d 90 e8 a3 d7 ff ff e8 52 3a 71 e0 fb 66 66 90 <66> 66 90 f6 05 58 e3 00 00 01 74 38 f6 05 58 e3 00 00 08 74 2f

@behlendorf
Copy link
Contributor

That's a new one. What version of the SPL/ZFS were you using?

@cwedgwood
Copy link
Contributor Author

commit 2b28613
Author: Brian Behlendorf [email protected]
Date: Wed Aug 29 11:52:47 2012 -0700

Should I git pull and retest? It's as simple as create a pool with a single (in this case 1TB SATA) disk, create a zvol then copy data to it.

@behlendorf
Copy link
Contributor

I just wanted to make sure it was something current. This may be an issue with Linux 3.5.3 we'll have to investigate further, but this code has been working well with older kernels for a while now.

@cwedgwood
Copy link
Contributor Author

Should I retest with 3.4.x or older still?

@behlendorf
Copy link
Contributor

Yes 3.4 would be a good start. I regularly test with zvols and this kernel so if it doesn't work then we'll have to figure out what's going on in your environment.

@cwedgwood
Copy link
Contributor Author

3.4.10 works flawlessly under the same conditions. I guess this is a 3.5.x issue :-(

@behlendorf
Copy link
Contributor

That at least narrows down where to look. There must have been another API change in the 3.5 kernel.

@cwedgwood
Copy link
Contributor Author

On 3.4.10 this happens as well:

[  555.735918] Pid: 3422, comm: z_wr_iss/1 Tainted: G           O 3.4.10-cw4-00001-g23bfbbf #7 Supermicro X8DTL/X8DTL
[  555.735922] RIP: 0010:[<ffffffffa001a0ae>]  [<ffffffffa001a0ae>] spl_kmem_cache_alloc+0x19a/0xa9d [spl]
[  555.735934] RSP: 0018:ffff88062eadbca0  EFLAGS: 00000246
[  555.735936] RAX: 0000000000000000 RBX: ffffffff814e129d RCX: ffff88062f896df8
[  555.735938] RDX: ffff88062f896de8 RSI: 0000000000000230 RDI: ffff88062f896e18
[  555.735940] RBP: ffff88062eadbd80 R08: 0000000000000000 R09: ffff8803301172b8
[  555.735942] R10: 0000000000000000 R11: ffff880330117200 R12: ffff88062eadbc18
[  555.735944] R13: 0000000000000046 R14: ffff88062eada000 R15: 0000000000000000
[  555.735947] FS:  0000000000000000(0000) GS:ffff880333c20000(0000) knlGS:0000000000000000
[  555.735949] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  555.735951] CR2: ffffffffff600400 CR3: 000000000174d000 CR4: 00000000000007e0
[  555.735953] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  555.735955] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  555.735957] Process z_wr_iss/1 (pid: 3422, threadinfo ffff88062eada000, task ffff88062eae0000)
[  555.735959] Stack:
[  555.735960]  ffff88062eadbcd0 ffffffffa01c39ce ffff88031558fa60 0000000000020000
[  555.735965]  ffff880330d7d800 0000000000000000 ffff88062eadbd10 ffff88062eae0000
[  555.735969]  ffff88062f896df8 ffff88062eadbd40 0000000000000206 ffff88062f896d60
[  555.735973] Call Trace:
[  555.736020]  [<ffffffffa01c39ce>] ? zio_vdev_io_done+0x3a/0x143 [zfs]
[  555.736046]  [<ffffffffa0197644>] ? vdev_config_sync+0x137/0x137 [zfs]
[  555.736071]  [<ffffffffa018c4b5>] ? spa_config_enter+0xc4/0xde [zfs]
[  555.736100]  [<ffffffffa01c463d>] zio_buf_alloc+0x22/0x24 [zfs]
[  555.736131]  [<ffffffffa01c4826>] zio_write_bp_init+0x19b/0x3da [zfs]
[  555.736161]  [<ffffffffa01c412d>] zio_execute+0xb8/0xe0 [zfs]
[  555.736168]  [<ffffffffa001c1a2>] taskq_thread+0x2aa/0x4d2 [spl]
[  555.736174]  [<ffffffff8105e46c>] ? finish_task_switch+0x82/0xad
[  555.736177]  [<ffffffff81062757>] ? try_to_wake_up+0x1d4/0x1d4
[  555.736184]  [<ffffffffa001bef8>] ? task_done+0x110/0x110 [spl]
[  555.736189]  [<ffffffff8105496a>] kthread+0x8b/0x93
[  555.736194]  [<ffffffff814e8b74>] kernel_thread_helper+0x4/0x10
[  555.736198]  [<ffffffff814e129d>] ? retint_restore_args+0x13/0x13
[  555.736201]  [<ffffffff810548df>] ? kthread_worker_fn+0x149/0x149
[  555.736204]  [<ffffffff814e8b70>] ? gs_change+0x13/0x13

As before it's so bad I have to reset the machine hard.

@cwedgwood
Copy link
Contributor Author

@behlendorf it seems this exists in 3.4.x as well, though 3.5.x triggers it more easily

i'm guessing it's a low memory / memory reclaim issue

to trigger this you need to write more data than there is memory in the system (1.5x suffices in my case)

cat path/to/vm/image > /dev/zvol/...

if you avoid memory pressure by doing:

cat path/to/vm | dd of=/dev/zvol/... iflag=fullblock bs=64M oflag=direct

or

echo 1 > /proc/sys/vm/drop_caches
# (during write back the problem never seems to trigger)

if the arc size is limited to under half the total memory of the system things behave much better, the suggestion by @byteharmony in issue #944 (#944 (comment)) seems to help greatly here

i wonder if the logic that arc can be 1/2 the system memory is a bit agressive on a dual socket (numa) system because it's possible (for whatever reason) to exhast all the memory in a node; it might be more reasonable to limit the arc size to some fraction of the smallest node

if you're testing this on a large system and can't make it show up, try allocating a bunch of large (2M) pages in each node ot artificially constrain the amount of generically available memory

fe:

# reserver 32B on each node, 64GB in total on 2-socket system
echo $((32*512)) > /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages
echo $((32*512)) > /sys/devices/system/node/node1/hugepages/hugepages-2048kB/nr_hugepages

(adjust as needed)

@behlendorf
Copy link
Contributor

@cwedgwood Interesting. Have you looked at the maximum number of emergency objects kicked in? You can check this in lat few columns of /proc/spl/kmem/slab. I did some some rough spots which the system would appear to hang when there's a sudden massive demand for these objects and memory is already low. However, in all my testing it did manage to work itself out in a few minutes.

@byteharmony
Copy link

There is a lot of info in that proc file, What are we looking for there?

My KVM test / devel environment has yet to fail on anything (Hyper-V didn't make it out of the gates and physical servers apear to be second best). I've now got serial configured for physical boxes. I have one I'm detailing crash info for you on right now. (Need to do it again, the 200 line limit isn't quite long enough ;).

BK

@behlendorf
Copy link
Contributor

In particular I'm interested in the last three columns of the zio_buf_131072 and zio_data_buf_131072 rows. They will show the worst case usage for these large allocations. I'm wondering if the need for them spiked at some point and we have several 100 or 1000 outstanding.

@cwedgwood
Copy link
Contributor Author

@behlendorf updated to rc11 and it still happens

i have:

  • zone_reclaim_mode = 5
  • min_unmapped_ratio = 3
  • i limit the arc to 1/6th of the system RAM (i'm desperate to make this go away)

I get 1000s of traces like:

[ 529.026866] [] ? kmalloc_nofail+0x2c/0x3e [spl]
[ 529.033462] [] zio_buf_alloc+0x22/0x24 [zfs]
[ 529.039706] [] vdev_raidz_io_start+0x252/0x538 [zfs]
[ 529.046720] [] zio_vdev_io_start+0x210/0x222 [zfs]
[ 529.053661] [] zio_nowait+0xdc/0x103 [zfs]
[ 529.059755] [] vdev_mirror_io_start+0x30a/0x327 [zfs]
[ 529.066920] [] ? vdev_config_sync+0x134/0x134 [zfs]
[ 529.073808] [] ? spa_config_enter+0xc4/0xde [zfs]
[ 529.080513] [] zio_vdev_io_start+0x4f/0x222 [zfs]
[ 529.087209] [] zio_execute+0xb8/0xe0 [zfs]
[ 529.093294] [] taskq_thread+0x2aa/0x4d2 [spl]
[ 529.099583] [] ? finish_task_switch+0x82/0xad
[ 529.105986] [] ? try_to_wake_up+0x1d4/0x1d4
[ 529.112138] [] ? task_done+0x110/0x110 [spl]
[ 529.118448] [] kthread+0x8b/0x93
[ 529.123680] [] kernel_thread_helper+0x4/0x10
[ 529.130027] [] ? retint_restore_args+0x13/0x13
[ 529.136534] [] ? kthread_worker_fn+0x149/0x149
[ 529.142959] [] ? gs_change+0x13/0x13

that crush the machine and require a reset

watching the spl kmem stats until this happes (one second intervals):

zio_buf_131072 0x00040 4194304 131072 4194304 131072 1 1 97 31 1 3001 0 0
zio_data_buf_131072 0x00040 0 0 4194304 131072 0 0 1 0 0 8 0 0

zio_buf_131072 0x00040 4194304 131072 4194304 131072 1 1 97 31 1 3001 0 0
zio_data_buf_131072 0x00040 0 0 4194304 131072 0 0 1 0 0 8 0 0

zio_buf_131072 0x00040 4194304 131072 4194304 131072 1 1 97 31 1 3001 0 0
zio_data_buf_131072 0x00040 0 0 4194304 131072 0 0 1 0 0 8 0 0-

zio_buf_131072 0x00040 4194304 131072 4194304 131072 1 1 97 31 1 3001 0 0
zio_data_buf_131072 0x00040 0 0 4194304 131072 0 0 1 0 0 8 0 0-

zio_buf_131072 0x00040 4194304 131072 4194304 131072 1 1 97 31 1 3001 0 0
zio_data_buf_131072 0x00040 0 0 4194304 131072 0 0 1 0 0 8 0 0-

zio_buf_131072 0x00040 4194304 131072 4194304 131072 1 1 97 31 1 3001 0 0
zio_data_buf_131072 0x00040 0 0 4194304 131072 0 0 1 0 0 8 0 0-

zio_buf_131072 0x00040 4194304 131072 4194304 131072 1 1 97 31 1 3001 0 0
zio_data_buf_131072 0x00040 0 0 4194304 131072 0 0 1 0 0 8 0 0

zio_buf_131072 0x00040 4194304 131072 4194304 131072 1 1 97 31 1 3001 0 0
zio_data_buf_131072 0x00040 0 0 4194304 131072 0 0 1 0 0 8 0 0

zio_buf_131072 0x00040 4194304 131072 4194304 131072 1 1 97 31 1 3001 0 0
zio_data_buf_131072 0x00040 0 0 4194304 131072 0 0 1 0 0 8 0 0

zio_buf_131072 0x00040 4194304 131072 4194304 131072 1 1 97 31 1 3001 0 0
zio_data_buf_131072 0x00040 0 0 4194304 131072 0 0 1 0 0 8 0 0

@cwedgwood
Copy link
Contributor Author

Adding to this.

It's trivial to break things with a local XFS filesystem on a ZVOL and some moderate write pressure. Typically things break in seconds but minutes at most.

This actually isn't even a contrived situation, I use ZVOLs for testing/development often and have basically has to abandon this as it's unusable.

@behlendorf
Copy link
Contributor

@cwedgwood Have you tried explicitly setting both the ZVOL block size and XFS block size to 4k. This should result in the easiest workload for the Linux VM and VFS. If the block sizes are mismatched it will result in lots of extra read+modify+write operations which might cause performance problems.

@cwedgwood
Copy link
Contributor Author

@behlendorf yes, I explicitly use 4K blocks in all cases

@cwedgwood
Copy link
Contributor Author

@behlendorf @ryao

testing with numa disabled (that is disabled in the chipset, so the ram is interleaved between nodes on cacheline boundaries, there is no SRAT, etc.. everything is flat)

it still beaks, hard, exactly the same way

that is when memory is low it wedges up solid

watching /proc/meminfo the last output i get before it dies

Wed Oct  3 09:49:09 PDT 2012
----
MemTotal:       24751420 kB
MemFree:          236368 kB
Buffers:            8804 kB
Cached:         15989516 kB
SwapCached:            0 kB
Active:            56876 kB
Inactive:       15963808 kB
Active(anon):      22016 kB
Inactive(anon):     1824 kB 
Active(file):      34860 kB
Inactive(file): 15961984 kB

with most stack traces in the form of:

[  727.598257] Call Trace:
[  727.598259]  [<ffffffffa0b6ed56>] ? zio_vdev_io_done+0x3a/0x143 [zfs]
[  727.598309]  [<ffffffffa0b42824>] ? vdev_config_sync+0x134/0x134 [zfs]
[  727.598335]  [<ffffffffa0b375fd>] ? spa_config_enter+0xc4/0xde [zfs]
[  727.598360]  [<ffffffffa0b6f9c5>] zio_buf_alloc+0x22/0x24 [zfs]
[  727.598390]  [<ffffffffa0b6fbae>] zio_write_bp_init+0x19b/0x3d9 [zfs]
[  727.598421]  [<ffffffffa0b6f4b5>] zio_execute+0xb8/0xe0 [zfs]
[  727.598451]  [<ffffffffa080d467>] taskq_thread+0x2aa/0x4d2 [spl]
[  727.598458]  [<ffffffff8105fe03>] ? finish_task_switch+0x82/0xad
[  727.598463]  [<ffffffff8106408a>] ? try_to_wake_up+0x1d4/0x1d4
[  727.598468]  [<ffffffffa080d1bd>] ? task_done+0x110/0x110 [spl]
[  727.598474]  [<ffffffff81055c61>] kthread+0x6f/0x77
[  727.598478]  [<ffffffff814f4934>] kernel_thread_helper+0x4/0x10
[  727.598483]  [<ffffffff814ecfdd>] ? retint_restore_args+0x13/0x13
[  727.598487]  [<ffffffff81055bf2>] ? kthread_freezable_should_stop+0x43/0x43
[  727.598489]  [<ffffffff814f4930>] ? gs_change+0x13/0x13

@behlendorf given how easy this is to break would it be useful to reproduce this in a kvm vm/instance which i can attach to the nearest pigeon heading you way?

@cwedgwood
Copy link
Contributor Author

@behlendorf
Copy link
Contributor

@cwedgwood OK, I give. I'll see if I can recreate it. If I can do that, it'll be a lot easier to kill. Can you post a trivial recipe for recreating this including:

  • How the VM is configured (Linux distribution, kernel version, total memory, etc)
  • How the zpool is configured
  • The command requires to cause the issue.

@aarrpp
Copy link

aarrpp commented Oct 6, 2012

Try echo 3 >/proc/sys/vm/drop_caches.
It's work for me on xen dom0.
I use zvols for xen domU but sometimes zfs locks happen.

@toadicus
Copy link

toadicus commented Oct 8, 2012

I'm also having this problem (I believe); you can see my conversation with Fajar in zfs-discuss about it here:

https://groups.google.com/a/zfsonlinux.org/forum/?fromgroups=#!topic/zfs-discuss/o26bGiNQ2z0

I see this problem in Gentoo (running Linux 3.4.9) and CentOS 6.3 (running Linux 2.6.32-279.9.1.el6) in (at least) versions 0.6.0-rc10 and 0.6.0-rc11 of spl/zfs. With any level of zfs_arc_max restriction (as much as 50% of RAM; 4GB in one case, 16GB in another, and down to 512MB), persistent writing to zvols drives the memory usage up until the system softlocks. The CentOS machine managed to kill arc_adapt and blkid at one point, there's a log here:

http://pastebin.com/4zQtY27t

Let me know what else I can give you.

@behlendorf
Copy link
Contributor

@drukargin The most useful thing you could post would be a trivial reproducer for CentOS 6.3 which I can run in a VM. If I can reproduce the issue with a ZVOL in a VM then there's a good chance we can make short work of it.

@toadicus
Copy link

As an update:

I'm having trouble reproducing the failure in a CentOS VM. It's possible the failure in my CentOS box was something similar to #1036, as it has been greatly helped since I tweaked zfs_arc_max down to about 1/3 of the RAM after my VM commitments.(32 GB total, 8 GB VMs, 8 GB zfs_arc_max).

The Gentoo box still has the problem; I'm going to try a few things on it today, and rebuild the my VM with Gentoo to see if I can reproduce it there.

@toadicus
Copy link

Further update: I tried downgrading the Gentoo box from 3.4.9 to 3.3.8 to 3.2.12; all continued to exhibit the same problem. I booted up a CentOS LiveCD, installed zfs and ran my test case with no problems. I'll retask my VM to work with Gentoo, and see if I can make you a case (I'll give you the root OS so you don't need to build it yourself).

@byteharmony
Copy link

Rereading this whole thread I see this is an issue with kernel 3.x. I'm only on 2.6 so I'm not of much help here. When centos moves to 3.x I'll be in the game!

I have no issues using dd to move data on zvol with 2.6

BK

@behlendorf
Copy link
Contributor

@cwedgwood OK, I think I've been able to reproduce this issue and I'm starting to dig in to it. It certainly looks like some sort of thrashing in the VM as you described. However, I'm only able to recreate the problem when I use a small zvol block size such, like the default 8K, when I increase the block size to 128K it's much better behaved. Does this match your experience?

@mauricev
Copy link

@behlendorf
Copy link
Contributor

@mauricev I'm actually looking at this issue now. And despite @cwedgwood's ability to hit it easily I'm not having much luck reproducing it. I can certainly make the VM work hard and get relatively poor I/O performance but nothing resembling an actually failure. Can you describe your hardware (cpus, memory, disk), exact test case (zvol block size, workload), and what it looks like when you get a failure?

@mauricev
Copy link

You can try setting up a Windows system with a hard drive filled with a few hundred gigs. I set up Gentoo Linux (given 4 cores, 8192MB for RAM and kernel 3.4.10, although you can also use 3.6.1) in VMWare Fusion and I created three 200 GB vmdks, split into 2 GB files and I created a raidz pool with them. Then I created a zvol with the most space, 375G. I then exported this out using the built-in iSCSI (with targetcli). Then on the Windows W2K8sp1) system, I copy the filled hard drive to the zvol using a program HDClone. It runs for a short while and then HDClone begins reporting errors and soft-lockups start. Unfortunately, the kernel doesn't give any more debugging info and the ssh console is locked up.

My Mac is a Mac Pro with dual quad-cores 2.8 Ghz and the W2K8 is a VM running on ESXi 4.1 (dual quad core 2.93 GHz). It's assigned 2 cores and 4.5 GB RAM.

Oddly, I just tried a pool of one drive (with volblocksize=128k to avoid the lockup) and the performance was even worse, barely making a 1MB/second.

@byteharmony
Copy link

So to make sure I understand, you are using ZOL to share out an iSCSI LUN which is having data pushed onto it via the HDClone program which is sending the iSCSI mounted ZOL Lun on the windows machine.

We do something very similar with windows backups sent to an iSCSI LUN on CENTOS6.3 using ZOL. No stability problems. It’s possible your problem lies between your ISCSI target software and ZOL.

My .02
BK

@mauricev
Copy link

Yep, that it. If I use volblocksize=128K, it runs fine, but very slowly. I don't yet know whether the slowness is due to zfs, iSCSI or somewhere else in the path.

@byteharmony
Copy link

I’ve attempted to do work with the vol block size in windows to see what is fastest with different speed tests, but the layers of software don’t allow for effective testing (iSCSI and windows). I may have a new way to do better testing. I connect the ZVOL direct to a KVM linux based guest OS. Haven’t done the tests yet though... Will post results when I get around to that.

A few things to consider: NTFS block size: a 64K size would make sense on a 128K block size. Perhaps doing 64k? and 64k?

BK

@behlendorf
Copy link
Contributor

For those chasing this issue you could try running with the following branch. It includes the following patches:

  • A patch to silence the "blocked thread" console message when it's observed in zio_wait() due to slow I/O. Instead an event with more useful debugging information will be generated which can be dumped by running zpool events -v. This should mostly resolve the console noise issue which only compounds the problem.
  • A patch which adds a /proc/spl/kstat/zfs/txg-<pool> file which shows detailed information about the life cycle of each TXG. Useful for seeing what's really going on.

https://github.com/behlendorf/spl/branches/stats
https://github.com/behlendorf/zfs/branches/stats

At the moment I'm unable to recreate anything more serious than a sluggish performance on my test system.

@behlendorf
Copy link
Contributor

After several hours of making copies of the kernel source tree in an xfs file system layered on top of a zvol with 4k block size I was able to make my test system quite laggy. Performance to the pool was very bad and I was seeing kswapd and arc_adapt take a large amount of cpu time.

I was never able to produce any "blocked thread" warnings, but I did profile the system (using oprofile) and determined that all the cpu time was going to spl_emergency_free and spl_emergency_alloc. This behavior is consistent with the original backtraces posted by @cwedgwood.

The profile points to an issue which I was originally concerned about when writing the emergency slab object code, commit e2dcc6e. However, at the time I wasn't able to come up with a real workload which caused the problem so I opted against prematurely optimizing for this case. The issue is that all the emergency objects which get created are tracked on a single list protected by a spin lock. Now as long as we never create to many of these objects it's not an issue. And really these should only be created under unlikely circumstances. However, in this test case that's just not true.

My system wracked up ~150,000 of these objects as arc buf headers which are quite long lived. That drug performance way down and could easily lead to starvation the more CPUs you have contending on this lock. The good news is that I put a patch together months ago to improve this by moving the emergency objects to a hash. As I said, I just never merged it because it made the code more complicated and I wasn't 100% sure we needed it.

https://github.com/behlendorf/spl/tree/slab_hash

I've merged that patch in to the previous "stats" branches and I expect it will help but I haven't had a chance to seriously test it just yet. If someone gets a chance to that would be great. What you want to look for is that the cpu usage drops considerably and the max depth show in /proc/spl/kmem/slab is some small value, particularly for the arc_buf_* slabs.

@behlendorf
Copy link
Contributor

You might want to hold of testing the slab hash patch. It looks like it needs a little more polish I managed to crash my system with it twice. I'll try and run down any lingering issues this weekend.

@behlendorf
Copy link
Contributor

@cwedgwood I have a set of patches which should significantly improve your use case. They resolve a number of performance issues in the slab implementation which were causing the system to thrash. Please try the code on the following branches and let me know.

https://github.com/behlendorf/spl/branches/stats
https://github.com/behlendorf/zfs/branches/stats

@cwedgwood
Copy link
Contributor Author

@behlendorf it seems a lot better, but with enough effort on the serial console:

[ 2504.454915] BUG: soft lockup - CPU#0 stuck for 22s! [z_wr_iss/0:27861]
[ 2504.464903] BUG: soft lockup - CPU#1 stuck for 22s! [z_wr_iss/1:27862]
[ 2504.474888] BUG: soft lockup - CPU#2 stuck for 22s! [z_wr_iss/2:27863]
[ 2504.494857] BUG: soft lockup - CPU#3 stuck for 22s! [z_wr_iss/3:27864]
[ 2504.664594] BUG: soft lockup - CPU#4 stuck for 22s! [z_wr_iss/4:27865]
[ 2504.674576] BUG: soft lockup - CPU#5 stuck for 22s! [z_wr_iss/5:27866]
[ 2504.684565] BUG: soft lockup - CPU#6 stuck for 22s! [z_wr_iss/6:27867]
[ 2504.704533] BUG: soft lockup - CPU#7 stuck for 22s! [z_wr_iss/7:27868]
[ 2504.714516] BUG: soft lockup - CPU#8 stuck for 22s! [z_wr_iss/8:27869]
[ 2504.724502] BUG: soft lockup - CPU#9 stuck for 22s! [z_wr_iss/9:27870]
[ 2504.744471] BUG: soft lockup - CPU#10 stuck for 22s! [z_wr_iss/10:27871]
[ 2504.754456] BUG: soft lockup - CPU#11 stuck for 22s! [z_wr_iss/11:27872]
[ 2504.764444] BUG: soft lockup - CPU#12 stuck for 22s! [z_wr_iss/12:27873]
[ 2504.784413] BUG: soft lockup - CPU#13 stuck for 22s! [z_wr_iss/13:27874]
[ 2504.794398] BUG: soft lockup - CPU#14 stuck for 22s! [z_wr_iss/14:27875]
[ 2504.804379] BUG: soft lockup - CPU#15 stuck for 22s! [z_wr_iss/15:27876]

[ 2532.412063] BUG: soft lockup - CPU#0 stuck for 22s! [z_wr_iss/0:27861]
[ 2532.422044] BUG: soft lockup - CPU#1 stuck for 22s! [z_wr_iss/1:27862]
[ 2532.432029] BUG: soft lockup - CPU#2 stuck for 22s! [z_wr_iss/2:27863]
[ 2532.451997] BUG: soft lockup - CPU#3 stuck for 22s! [z_wr_iss/3:27864]
[ 2532.621740] BUG: soft lockup - CPU#4 stuck for 22s! [z_wr_iss/4:27865]
[ 2532.631726] BUG: soft lockup - CPU#5 stuck for 22s! [z_wr_iss/5:27866]
[ 2532.641709] BUG: soft lockup - CPU#6 stuck for 22s! [z_wr_iss/6:27867]
[ 2532.661679] BUG: soft lockup - CPU#7 stuck for 22s! [z_wr_iss/7:27868]
[ 2532.671666] BUG: soft lockup - CPU#8 stuck for 22s! [z_wr_iss/8:27869]
[ 2532.681649] BUG: soft lockup - CPU#9 stuck for 22s! [z_wr_iss/9:27870]
[ 2532.701615] BUG: soft lockup - CPU#10 stuck for 22s! [z_wr_iss/10:27871]
[ 2532.711599] BUG: soft lockup - CPU#11 stuck for 22s! [z_wr_iss/11:27872]
[ 2532.721585] BUG: soft lockup - CPU#12 stuck for 22s! [z_wr_iss/12:27873]
[ 2532.741557] BUG: soft lockup - CPU#13 stuck for 22s! [z_wr_iss/13:27874]
[ 2532.751542] BUG: soft lockup - CPU#14 stuck for 22s! [z_wr_iss/14:27875]
[ 2532.761527] BUG: soft lockup - CPU#15 stuck for 22s! [z_wr_iss/15:27876]

[ 2560.369203] BUG: soft lockup - CPU#0 stuck for 22s! [z_wr_iss/0:27861]
[ 2560.379187] BUG: soft lockup - CPU#1 stuck for 22s! [z_wr_iss/1:27862]
[ 2560.389175] BUG: soft lockup - CPU#2 stuck for 22s! [z_wr_iss/2:27863]
[ 2560.409146] BUG: soft lockup - CPU#3 stuck for 22s! [z_wr_iss/3:27864]
[ 2560.578884] BUG: soft lockup - CPU#4 stuck for 22s! [z_wr_iss/4:27865]
[ 2560.588866] BUG: soft lockup - CPU#5 stuck for 22s! [z_wr_iss/5:27866]
[ 2560.598850] BUG: soft lockup - CPU#6 stuck for 22s! [z_wr_iss/6:27867]
[ 2560.618820] BUG: soft lockup - CPU#7 stuck for 22s! [z_wr_iss/7:27868]
[ 2560.628804] BUG: soft lockup - CPU#8 stuck for 22s! [z_wr_iss/8:27869]
[ 2560.638792] BUG: soft lockup - CPU#9 stuck for 22s! [z_wr_iss/9:27870]
[ 2560.658761] BUG: soft lockup - CPU#10 stuck for 22s! [z_wr_iss/10:27871]
[ 2560.668746] BUG: soft lockup - CPU#11 stuck for 22s! [z_wr_iss/11:27872]
[ 2560.678731] BUG: soft lockup - CPU#12 stuck for 22s! [z_wr_iss/12:27873]
[ 2560.698700] BUG: soft lockup - CPU#13 stuck for 22s! [z_wr_iss/13:27874]
[ 2560.708685] BUG: soft lockup - CPU#14 stuck for 22s! [z_wr_iss/14:27875]
[ 2560.718668] BUG: soft lockup - CPU#15 stuck for 22s! [z_wr_iss/15:27876]

at which point i had to reboot to get the machine back

@behlendorf
Copy link
Contributor

Well that's some progress at least. What kernel version were you using?

@cwedgwood
Copy link
Contributor Author

That last test was 3.6.4 but the problem exists in 3.4.x and 3.5.x as well. I had earlier switched to 3.6.x in the hope it would help.

@behlendorf
Copy link
Contributor

OK, I only ask because there was a serious vmalloc() performance regression between 2.6.31-2.6.38 which I suspect is causing some of my slowness in my testing. I'm patching the RHEL kernel now with the upstream fix to see how much it improves things.

torvalds/linux@8969960

Before your system hangs do you notice the [events/x] threads starting to take a majority of the available CPU?

@cwedgwood
Copy link
Contributor Author

for reference

        NAME        STATE     READ WRITE CKSUM
        tank0       ONLINE       0     0     0
          raidz1-0  ONLINE       0     0     0
            sde     ONLINE       0     0     0
            sdf     ONLINE       0     0     0
            sdg     ONLINE       0     0     0
            sdh     ONLINE       0     0     0
            sdi     ONLINE       0     0     0
        logs
          sdb1      ONLINE       0     0     0
        cache
          sdc       ONLINE       0     0     0
          sdd       ONLINE       0     0     0

log and cache devices don't seem to affect things

ashift=12

sde...sdi are 2TiB AF (4k sector) 'green' devices (ie. slow)

@cwedgwood
Copy link
Contributor Author

a tiny amount of IO (500MB of small files) ... worked fine, then about 5s later (guessing) the machine wedged up with:

[44412.678135] INFO: rcu_sched self-detected stall on CPU
[44412.678138] INFO: rcu_sched self-detected stall on CPU
[44412.678140] INFO: rcu_sched self-detected stall on CPU

busy processes

28998 root      20   0     0    0    0 D    40  0.0   0:00.69 flush-240:0                                
  660 root      20   0     0    0    0 D    37  0.0   0:01.22 kswapd1                                    
22567 root      39  19     0    0    0 S    19  0.0   0:00.25 zvol/1                                     
22569 root      39  19     0    0    0 D    17  0.0   0:00.23 zvol/3                                     
22573 root      39  19     0    0    0 D    17  0.0   0:00.23 zvol/7                                     
22574 root      39  19     0    0    0 D    17  0.0   0:00.23 zvol/8                                     
22568 root      39  19     0    0    0 D    16  0.0   0:00.22 zvol/2                                     
22570 root      39  19     0    0    0 D    16  0.0   0:00.22 zvol/4                                     
22575 root      39  19     0    0    0 D    16  0.0   0:00.22 zvol/9                                     
22577 root      39  19     0    0    0 D    16  0.0   0:00.23 zvol/11                                    
22571 root      39  19     0    0    0 D    15  0.0   0:00.21 zvol/5                                     
22566 root      39  19     0    0    0 R    14  0.0   0:00.21 zvol/0                                     
22572 root      39  19     0    0    0 R    14  0.0   0:00.19 zvol/6                                     
22576 root      39  19     0    0    0 D    14  0.0   0:00.21 zvol/10                                    
23163 root      20   0     0    0    0 S    13  0.0   0:05.94 kworker/6:4                                
23801 root      20   0     0    0    0 S    12  0.0   0:03.85 kworker/1:4                                
27331 root      20   0     0    0    0 S    12  0.0   0:00.23 kworker/4:0                                
 1016 root      20   0     0    0    0 S    11  0.0   0:01.89 kworker/9:1                                
 1076 root      20   0     0    0    0 S    11  0.0   0:02.62 kworker/10:1                               
22870 root      20   0     0    0    0 D    11  0.0   0:05.53 kworker/7:3                                
23603 root      20   0     0    0    0 R    11  0.0   0:03.34 kworker/3:5                                
 1099 root      20   0     0    0    0 R     9  0.0   0:01.84 kworker/8:1                                
 1100 root      20   0     0    0    0 S     9  0.0   0:02.40 kworker/11:1                               
23605 root      20   0     0    0    0 S     9  0.0   0:05.42 kworker/5:4                                
27869 root      20   0     0    0    0 S     8  0.0   0:00.08 kworker/0:2                                

wedged processes

USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         4  0.0  0.0      0     0 ?        D    Nov01   0:13 [kworker/0:0]
root       660  0.0  0.0      0     0 ?        D    Nov01   0:01 [kswapd1]
root      1081  0.0  0.0      0     0 ?        D    Nov01   0:00 [xfsaild/sda3]
root     22566  0.0  0.0      0     0 ?        DN   Nov01   0:00 [zvol/0]
root     22568  0.0  0.0      0     0 ?        DN   Nov01   0:00 [zvol/2]
root     22569  0.0  0.0      0     0 ?        DN   Nov01   0:00 [zvol/3]
root     22570  0.0  0.0      0     0 ?        DN   Nov01   0:00 [zvol/4]
root     22571  0.0  0.0      0     0 ?        DN   Nov01   0:00 [zvol/5]
root     22572  0.0  0.0      0     0 ?        DN   Nov01   0:00 [zvol/6]
root     22573  0.0  0.0      0     0 ?        DN   Nov01   0:00 [zvol/7]
root     22574  0.0  0.0      0     0 ?        DN   Nov01   0:00 [zvol/8]
root     22575  0.0  0.0      0     0 ?        DN   Nov01   0:00 [zvol/9]
root     22576  0.0  0.0      0     0 ?        DN   Nov01   0:00 [zvol/10]
root     22577  0.0  0.0      0     0 ?        DN   Nov01   0:00 [zvol/11]
root     22870  0.0  0.0      0     0 ?        D    Nov01   0:05 [kworker/7:3]
root     23828  0.0  0.0      0     0 ?        DN   Nov01   0:02 [z_wr_iss/0]
root     23830  0.0  0.0      0     0 ?        DN   Nov01   0:02 [z_wr_iss/2]
root     23831  0.0  0.0      0     0 ?        DN   Nov01   0:02 [z_wr_iss/3]
root     23832  0.0  0.0      0     0 ?        DN   Nov01   0:02 [z_wr_iss/4]
root     23833  0.0  0.0      0     0 ?        DN   Nov01   0:02 [z_wr_iss/5]
root     23835  0.0  0.0      0     0 ?        DN   Nov01   0:03 [z_wr_iss/7]
root     23836  0.0  0.0      0     0 ?        DN   Nov01   0:03 [z_wr_iss/8]
root     23837  0.0  0.0      0     0 ?        DN   Nov01   0:03 [z_wr_iss/9]
root     23838  0.0  0.0      0     0 ?        DN   Nov01   0:02 [z_wr_iss/10]
root     23840  0.0  0.0      0     0 ?        DN   Nov01   0:03 [z_wr_iss/12]
root     23841  0.0  0.0      0     0 ?        DN   Nov01   0:03 [z_wr_iss/13]
root     23843  0.0  0.0      0     0 ?        DN   Nov01   0:03 [z_wr_iss/15]
root     23850  0.0  0.0      0     0 ?        DN   Nov01   0:00 [z_wr_int/1]
root     23861  0.0  0.0      0     0 ?        DN   Nov01   0:01 [z_wr_int/12]
root     28998  4.7  0.0      0     0 ?        D    10:31   0:00 [flush-240:0]

meminfo

MemTotal:       24751200 kB
MemFree:          167020 kB
Buffers:            3872 kB
Cached:          5918016 kB
SwapCached:          176 kB
Active:          1854856 kB
Inactive:        4093176 kB
Active(anon):     531124 kB
Inactive(anon):    33660 kB
Active(file):    1323732 kB
Inactive(file):  4059516 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:       3903788 kB
SwapFree:        3901768 kB
Dirty:            217920 kB
Writeback:          2884 kB
AnonPages:         25276 kB
Mapped:             9852 kB
Shmem:            539296 kB
Slab:            1314196 kB
SReclaimable:     580856 kB
SUnreclaim:       733340 kB
KernelStack:        3784 kB
PageTables:         4816 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    16279388 kB
Committed_AS:     640276 kB
VmallocTotal:   34359738367 kB
VmallocUsed:    10801752 kB
VmallocChunk:   34335371952 kB
HardwareCorrupted:     0 kB
AnonHugePages:         0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:        3648 kB
DirectMap2M:     2084864 kB
DirectMap1G:    23068672 kB

spl slab

--------------------- cache -------------------------------------------------------  ----- slab ------  ---- object -----  --- emergency ---
name                                    flags      size     alloc slabsize  objsize  total alloc   max  total alloc   max  dlock alloc   max
spl_vn_cache                          0x00020         0         0     4096       88      0     0    12      0     0   252      0     0     0
spl_vn_file_cache                     0x00020         0         0     4096       96      0     0    12      0     0   252      0     0     0
spl_zlib_workspace_cache              0x00140         0         0  8388608   268104      0     0     0      0     0     0      0     0     0
zio_cache                             0x00020  13615104  12642560    16384      952    831   830   830  13296 13280 13280      0     0     0
zio_link_cache                        0x00020   1261568    663120     4096       48    308   307   307  13860 13815 13815      0     0     0
zio_vdev_cache                        0x00040  25165824  17434704  4194304   131088      6     6     9    180   133   256      0     0     0
zio_buf_512                           0x00020 1959755776 918073856    32768      512  59807 59807 60219  1854017 1793113 1866789      0     0     0
zio_data_buf_512                      0x00020  31555584  15273984    32768      512    963   963   964  29853 29832 29884      0     0     0
zio_buf_1024                          0x00040     65536     43008    32768     1024      2     2   788     42    42 16548      0     0     0
zio_data_buf_1024                     0x00040  23232512  15173632    32768     1024    709   709   709  14889 14818 14889      0     0     0
zio_buf_1536                          0x00040    131072     95232    65536     1536      2     2   195     62    62  6045      0     0     0
zio_data_buf_1536                     0x00040  23199744  16723968    65536     1536    354   354   355  10974 10888 11005      0     0     0
zio_buf_2048                          0x00040    589824    337920    65536     2048      9     9   129    225   165  3225      0     0     0
zio_data_buf_2048                     0x00040  24248320  18700288    65536     2048    370   370   371   9250  9131  9275      0     0     0
zio_buf_2560                          0x00040    458752    376320    65536     2560      7     7    98    147   147  2058      0     0     0
zio_data_buf_2560                     0x00040  22347776  18142720    65536     2560    341   341   341   7161  7087  7161      0     0     0
zio_buf_3072                          0x00040    262144    196608    65536     3072      4     4    84     72    64  1512      0     0     0
zio_data_buf_3072                     0x00040  21299200  17774592    65536     3072    325   325   325   5850  5786  5850      0     0     0
zio_buf_3584                          0x00040    262144    222208   131072     3584      2     2    66     62    62  2046      0     0     0
zio_data_buf_3584                     0x00040  21889024  18464768   131072     3584    167   167   170   5177  5152  5270      0     0     0
zio_buf_4096                          0x00040  12058624   4141056   262144     4096     46    40   112   1426  1011  3472      0     0     0
zio_data_buf_4096                     0x00040 770703360 373182464   262144     4096   2940  2939  2939  91140 91109 91109      0     0     0
zio_buf_5120                          0x00040    524288    430080   131072     5120      4     4    55     84    84  1155      0     0     0
zio_data_buf_5120                     0x00040  46923776  38205440   131072     5120    358   358   359   7518  7462  7539      0     0     0
zio_buf_6144                          0x00040   1703936   1437696   131072     6144     13    13    46    234   234   828      0     0     0
zio_data_buf_6144                     0x00040  43909120  36925440   131072     6144    335   335   338   6030  6010  6084      0     0     0
zio_buf_7168                          0x00040         0         0   262144     7168      0     0    23      0     0   713      0     0     0
zio_data_buf_7168                     0x00040  43515904  36764672   262144     7168    166   166   169   5146  5129  5239      0     0     0
zio_buf_8192                          0x00040   4980736   3268608   262144     8192     19    19    41    399   399   861      0     0     0
zio_data_buf_8192                     0x00040  54525952  35627008   262144     8192    208   208   212   4368  4349  4452      0     0     0
zio_buf_10240                         0x00040         0         0   262144    10240      0     0    39      0     0   819      0     0     0
zio_data_buf_10240                    0x00040  86507520  70635520   262144    10240    330   330   333   6930  6898  6993      0     0     0
zio_buf_12288                         0x00040         0         0   524288    12288      0     0    27      0     0   837      0     0     0
zio_data_buf_12288                    0x00040  81264640  58294272   524288    12288    155   155   155   4805  4744  4805      0     0     0
zio_buf_14336                         0x00040         0         0   524288    14336      0     0    17      0     0   527      0     0     0
zio_data_buf_14336                    0x00040  60293120  49846272   524288    14336    115   115   115   3565  3477  3565      0     0     0
zio_buf_16384                         0x00040 1598554112 1031274496   524288    16384   3049  3049  3161  76225 62944 79025      0     0     0
zio_data_buf_16384                    0x00040  62390272  47955968   524288    16384    119   119   119   2975  2927  2975      0     0     0
zio_buf_20480                         0x00040         0         0   524288    20480      0     0    42      0     0   863      0     0     0
zio_data_buf_20480                    0x00040 104333312  84254720   524288    20480    199   199   201   4179  4114  4221      0     0     0
zio_buf_24576                         0x00040         0         0   524288    24576      0     0    41      0     0   738      0     0     0
zio_data_buf_24576                    0x00040  90177536  74907648   524288    24576    172   172   173   3096  3048  3114      0     0     0
zio_buf_28672                         0x00040         0         0  1048576    28672      0     0    20      0     0   620      0     0     0
zio_data_buf_28672                    0x00040  95420416  77758464  1048576    28672     91    91    94   2821  2712  2914      0     0     0
zio_buf_32768                         0x00040         0         0  1048576    32768      0     0    69      0     0  1932      0     0     0
zio_data_buf_32768                    0x00040  79691776  65503232  1048576    32768     76    76    77   2128  1999  2136      0     0     0
zio_buf_36864                         0x00040         0         0  1048576    36864      0     0    19      0     0   475      0     0     0
zio_data_buf_36864                    0x00040 101711872  85708800  1048576    36864     97    97    97   2425  2325  2425      0     0     0
zio_buf_40960                         0x00040         0         0  1048576    40960      0     0    21      0     0   483      0     0     0
zio_data_buf_40960                    0x00040 126877696 106250240  1048576    40960    121   121   121   2783  2594  2783      0     0     0
zio_buf_45056                         0x00040         0         0  1048576    45056      0     0    27      0     0   567      0     0     0
zio_data_buf_45056                    0x00040  98566144  81776640  1048576    45056     94    94    94   1974  1815  1974      0     0     0
zio_buf_49152                         0x00040         0         0  1048576    49152      0     0    26      0     0   494      0     0     0
zio_data_buf_49152                    0x00040 111149056  94027776  1048576    49152    106   106   106   2014  1913  2014      0     0     0
zio_buf_53248                         0x00040         0         0  1048576    53248      0     0    23      0     0   414      0     0     0
zio_data_buf_53248                    0x00040  72351744  59052032  1048576    53248     69    69    69   1242  1109  1242      0     0     0
zio_buf_57344                         0x00040         0         0  1048576    57344      0     0    26      0     0   442      0     0     0
zio_data_buf_57344                    0x00040  47185920  42262528  1048576    57344     45    45    45    765   737   765      0     0     0
zio_buf_61440                         0x00040         0         0  2097152    61440      0     0    16      0     0   466      0     0     0
zio_data_buf_61440                    0x00040  41943040  36311040  2097152    61440     20    20    20    620   591   620      0     0     0
zio_buf_65536                         0x00040         0         0  2097152    65536      0     0    14      0     0   420      0     0     0
zio_data_buf_65536                    0x00040  35651584  28901376  2097152    65536     17    17    19    510   441   570      0     0     0
zio_buf_69632                         0x00040         0         0  2097152    69632      0     0    22      0     0   616      0     0     0
zio_data_buf_69632                    0x00040  35651584  31473664  2097152    69632     17    17    17    476   452   476      0     0     0
zio_buf_73728                         0x00040         0         0  2097152    73728      0     0    24      0     0   624      0     0     0
zio_data_buf_73728                    0x00040  44040192  39297024  2097152    73728     21    21    21    546   533   546      0     0     0
zio_buf_77824                         0x00040         0         0  2097152    77824      0     0    27      0     0   675      0     0     0
zio_data_buf_77824                    0x00040  54525952  48484352  2097152    77824     26    26    26    650   623   650      0     0     0
zio_buf_81920                         0x00040         0         0  2097152    81920      0     0    33      0     0   792      0     0     0
zio_data_buf_81920                    0x00040  48234496  44400640  2097152    81920     23    23    23    552   542   552      0     0     0
zio_buf_86016                         0x00040         0         0  2097152    86016      0     0    22      0     0   506      0     0     0
zio_data_buf_86016                    0x00040  41943040  38019072  2097152    86016     20    20    20    460   442   460      0     0     0
zio_buf_90112                         0x00040         0         0  2097152    90112      0     0    18      0     0   396      0     0     0
zio_data_buf_90112                    0x00040  37748736  32169984  2097152    90112     18    18    20    396   357   440      0     0     0
zio_buf_94208                         0x00040         0         0  2097152    94208      0     0    17      0     0   357      0     0     0
zio_data_buf_94208                    0x00040  25165824  21950464  2097152    94208     12    12    12    252   233   252      0     0     0
zio_buf_98304                         0x00040         0         0  2097152    98304      0     0    28      0     0   560      0     0     0
zio_data_buf_98304                    0x00040  27262976  23887872  2097152    98304     13    13    13    260   243   260      0     0     0
zio_buf_102400                        0x00040         0         0  2097152   102400      0     0    25      0     0   475      0     0     0
zio_data_buf_102400                   0x00040  25165824  20172800  2097152   102400     12    12    12    228   197   228      0     0     0
zio_buf_106496                        0x00040         0         0  2097152   106496      0     0    16      0     0   288      0     0     0
zio_data_buf_106496                   0x00040  31457280  24387584  2097152   106496     15    15    15    270   229   270      0     0     0
zio_buf_110592                        0x00040         0         0  2097152   110592      0     0    19      0     0   325      0     0     0
zio_data_buf_110592                   0x00040  33554432  28311552  2097152   110592     16    16    16    288   256   288      0     0     0
zio_buf_114688                        0x00040         0         0  2097152   114688      0     0    19      0     0   311      0     0     0
zio_data_buf_114688                   0x00040  25165824  20529152  2097152   114688     12    12    12    204   179   204      0     0     0
zio_buf_118784                        0x00040         0         0  2097152   118784      0     0    13      0     0   221      0     0     0
zio_data_buf_118784                   0x00040  25165824  20193280  2097152   118784     12    12    12    204   170   204      0     0     0
zio_buf_122880                        0x00040         0         0  2097152   122880      0     0    13      0     0   208      0     0     0
zio_data_buf_122880                   0x00040  18874368  15605760  2097152   122880      9     9    10    144   127   160      0     0     0
zio_buf_126976                        0x00040         0         0  4194304   126976      0     0    11      0     0   341      0     0     0
zio_data_buf_126976                   0x00040  20971520  15491072  4194304   126976      5     5     5    155   122   155      0     0     0
zio_buf_131072                        0x00040  37748736   9961472  4194304   131072      9     9    47    279    76  1450      0     0     0
zio_data_buf_131072                   0x00040 6014631936 4962779136  4194304   131072   1434  1434  1684  44454 37863 52204      0     0     0
sa_cache                              0x00020 222527488 143411040     4096       80  54328 54328 54331  1792824 1792638 1792923      0     0     0
spill_cache                           0x00040         0         0  4194304   131072      0     0     0      0     0     0      0     0     0
dnode_t                               0x00020 1858338816 1684049952    16384      928  113424 113424 113429  1814784 1814709 1814864      0     0     0
dmu_buf_impl_t                        0x00020 677863424 574939960     8192      280  82747 82747 83583  2068675 2053357 2089575      0     0     0
arc_buf_hdr_t                         0x00020 163201024 140881312     8192      272  19922 19921 19921  517972 517946 517946      0     0     0
arc_buf_t                             0x00020  55218176  38631376     4096      112  13481 13481 15871  350506 344923 412646      0     0     0
zil_lwb_cache                         0x00020    528384    412800     4096      200    129   129   129   2064  2064  2064      0     0     0
zfs_znode_cache                       0x00020 1835827200 1677925080    16384      936  112050 112050 112053  1792800 1792655 1792848      0     0     0

arc stats

name                            type data
hits                            4    10397012
misses                          4    483291
demand_data_hits                4    3971244
demand_data_misses              4    77845
demand_metadata_hits            4    3513236
demand_metadata_misses          4    253398
prefetch_data_hits              4    77958
prefetch_data_misses            4    84806
prefetch_metadata_hits          4    2834574
prefetch_metadata_misses        4    67242
mru_hits                        4    2385761
mru_ghost_hits                  4    92485
mfu_hits                        4    5101244
mfu_ghost_hits                  4    21572
deleted                         4    138
recycle_miss                    4    18287
mutex_miss                      4    16
evict_skip                      4    168
evict_l2_cached                 4    3344460288
evict_l2_eligible               4    13664256
evict_l2_ineligible             4    84999168
hash_elements                   4    443926
hash_elements_max               4    443926
hash_collisions                 4    154100
hash_chains                     4    109623
hash_chain_max                  4    8
p                               4    17759673
c                               4    10998040368
c_min                           4    2147483648
c_max                           4    17179869184
size                            4    10998035512
hdr_size                        4    180145056
data_size                       4    7985608704
other_size                      4    2832281752
anon_size                       4    307566592
anon_evict_data                 4    0
anon_evict_metadata             4    0
mru_size                        4    5848043520
mru_evict_data                  4    5083303936
mru_evict_metadata              4    0
mru_ghost_size                  4    2902842880
mru_ghost_evict_data            4    2750266368
mru_ghost_evict_metadata        4    152576512
mfu_size                        4    1829998592
mfu_evict_data                  4    1576538112
mfu_evict_metadata              4    0
mfu_ghost_size                  4    383837184
mfu_ghost_evict_data            4    20037632
mfu_ghost_evict_metadata        4    363799552
l2_hits                         4    111304
l2_misses                       4    371924
l2_feeds                        4    43424
l2_rw_clash                     4    21
l2_read_bytes                   4    282380288
l2_write_bytes                  4    5715663872
l2_writes_sent                  4    1393
l2_writes_done                  4    1393
l2_writes_error                 4    0
l2_writes_hdr_miss              4    64
l2_evict_lock_retry             4    0
l2_evict_reading                4    0
l2_free_on_write                4    84
l2_abort_lowmem                 4    6
l2_cksum_bad                    4    0
l2_io_error                     4    0
l2_size                         4    5660558848
l2_hdr_size                     4    0
memory_throttle_count           4    0
memory_direct_count             4    126
memory_indirect_count           4    4057
arc_no_grow                     4    0
arc_tempreserve                 4    2850816
arc_loaned_bytes                4    0
arc_prune                       4    0
arc_meta_used                   4    4037600312
arc_meta_limit                  4    4294967296
arc_meta_max                    4    4297072952

@cwedgwood
Copy link
Contributor Author

with the latest 'stats spl i get further but still things explode

[ 2899.886775] BUG: soft lockup - CPU#0 stuck for 22s! [z_wr_iss/0:24014]
[ 2899.893659] Modules linked in: zfs(O)[ 2899.896756] BUG: soft lockup - CPU#1 stuck for 22s! [z_wr_iss/1:24015]
[ 2899.896757] Modules linked in: zfs(O) zunicode(O) zavl(O) zcommon(O) znvpair(O) spl(O) loop tcm_loop iscsi_target_mod target_core_mod configfs sg coretemp kvm_intel kvm mpt2sas i2c_i801 ghash_clmulni_intel i2c_core iTCO_wdt iTCO_vendor_support lpc_ich mfd_core microcode i7core_edac [last unloaded: spl]
[ 2899.896775] CPU 1 [ 2899.896777] Pid: 24015, comm: z_wr_iss/1 Tainted: G           O 3.6.5.cw1 #25 Supermicro X8DTL/X8DTL
[ 2899.896778] RIP: 0010:[<ffffffffa01670de>]  [<ffffffffa01670de>] spl_kmem_cache_alloc+0x191/0xb97 [spl]
[ 2899.896790] RSP: 0018:ffff880316a07bb0  EFLAGS: 00000246
[ 2899.896791] RAX: 0000000000000000 RBX: ffffffff814fa21d RCX: ffff88032a603620
[ 2899.896792] RDX: ffff88032a6035e8 RSI: 0000000000000286 RDI: ffff88032a603610
[ 2899.896793] RBP: ffff880316a07c80 R08: ffff880316a06000 R09: 0000000000000000
[ 2899.896794] R10: 0140000000000000 R11: ffff880316a07c28 R12: ffff880316a07b28
[ 2899.896795] R13: 0000000000000046 R14: ffff880316a06000 R15: 0000000000000000
[ 2899.896796] FS:  0000000000000000(0000) GS:ffff880333c20000(0000) knlGS:0000000000000000
[ 2899.896798] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 2899.896799] CR2: 00007f3a0a97ce02 CR3: 0000000001775000 CR4: 00000000000007e0
[ 2899.896800] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 2899.896801] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 2899.896802] Process z_wr_iss/1 (pid: 24015, threadinfo ffff880316a06000, task ffff8803166e2d40)
[ 2899.896803] Stack:
[ 2899.896803]  ffff880316a07c00 0000000000000000 0000000000000202 ffff88032a605960
[ 2899.896807]  0000003000000000 ffff880216249f40 ffff8803166e2d40 ffff88032a6035f8
[ 2899.896811]  ffff880316a07c40 0000000016249f40 0000000000000212 ffff88032a603560
[ 2899.896814] Call Trace:
[ 2899.896816]  [<ffffffff81059f93>] ? abort_exclusive_wait+0x8f/0x8f
[ 2899.896821]  [<ffffffffa035628d>] zio_buf_alloc+0x22/0x24 [zfs]
[ 2899.896864]  [<ffffffffa0358dc2>] zio_vdev_io_start+0xda/0x223 [zfs]
[ 2899.896894]  [<ffffffffa0357c8e>] zio_nowait+0xdc/0x103 [zfs]
[ 2899.896923]  [<ffffffffa0329647>] vdev_mirror_io_start+0x30a/0x327 [zfs]
[ 2899.896947]  [<ffffffffa0328fc4>] ? vdev_config_sync+0x134/0x134 [zfs]
[ 2899.896972]  [<ffffffffa031dcc5>] ? spa_config_enter+0xc4/0xde [zfs]
[ 2899.896996]  [<ffffffffa0358d38>] zio_vdev_io_start+0x50/0x223 [zfs]
[ 2899.897026]  [<ffffffffa0355d7d>] zio_execute+0xb8/0xe0 [zfs]
[ 2899.897056]  [<ffffffffa0169587>] taskq_thread+0x2aa/0x4d2 [spl]
[ 2899.897062]  [<ffffffff8106394b>] ? finish_task_switch+0x48/0xad
[ 2899.897066]  [<ffffffff81067b74>] ? try_to_wake_up+0x1d4/0x1d4
[ 2899.897068]  [<ffffffffa01692dd>] ? task_done+0x110/0x110 [spl]
[ 2899.897074]  [<ffffffff81059601>] kthread+0x6f/0x77
[ 2899.897076]  [<ffffffff81501904>] kernel_thread_helper+0x4/0x10
[ 2899.897081]  [<ffffffff814fa21d>] ? retint_restore_args+0x13/0x13
[ 2899.897086]  [<ffffffff81059592>] ? kthread_freezable_should_stop+0x43/0x43
[ 2899.897088]  [<ffffffff81501900>] ? gs_change+0x13/0x13



[ 2899.906742] BUG: soft lockup - CPU#2 stuck for 22s! [z_wr_iss/2:24016]
[ 2899.906742] Modules linked in: zfs(O) zunicode(O) zavl(O) zcommon(O) znvpair(O) spl(O) loop tcm_loop iscsi_target_mod target_core_mod configfs sg coretemp kvm_intel kvm mpt2sas i2c_i801 ghash_clmulni_intel i2c_core iTCO_wdt iTCO_vendor_support lpc_ich mfd_core microcode i7core_edac [last unloaded: spl]
[ 2899.906757] CPU 2 [ 2899.906759] Pid: 24016, comm: z_wr_iss/2 Tainted: G           O 3.6.5.cw1 #25 Supermicro X8DTL/X8DTL
[ 2899.906760] RIP: 0010:[<ffffffffa01670de>]  [<ffffffffa01670de>] spl_kmem_cache_alloc+0x191/0xb97 [spl]
[ 2899.906767] RSP: 0018:ffff880317aa1ae0  EFLAGS: 00000246
[ 2899.906768] RAX: 0000000000000000 RBX: ffffffff814fa21d RCX: ffff88032a603620
[ 2899.906769] RDX: ffff88032a6035e8 RSI: 0000000000000282 RDI: ffff88032a603610
[ 2899.906770] RBP: ffff880317aa1bb0 R08: ffff880317aa0000 R09: 0000000000000000
[ 2899.906772] R10: 0140000000000000 R11: ffff880317aa1b58 R12: ffff880317aa1a58
[ 2899.906773] R13: 0000000000000046 R14: ffff880317aa0000 R15: 0000000000000000
[ 2899.906774] FS:  0000000000000000(0000) GS:ffff880333c40000(0000) knlGS:0000000000000000
[ 2899.906775] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 2899.906776] CR2: 00000000006edbd4 CR3: 0000000001775000 CR4: 00000000000007e0
[ 2899.906777] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 2899.906778] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 2899.906780] Process z_wr_iss/2 (pid: 24016, threadinfo ffff880317aa0000, task ffff8803166e43e0)
[ 2899.906780] Stack:
[ 2899.906781]  00000000000001cc 0000000000000000 ffff880317aa1b20 ffffffffa0164d0b
[ 2899.906784]  ffff880317aa1b50 00000030a0164d0b ffff8803166e43e0 ffff88032a6035f8
[ 2899.906788]  ffff880317aa1b70 00000000a0164e8a 0000000000000206 ffff88032a603560
[ 2899.906791] Call Trace:
[ 2899.906792]  [<ffffffffa0164d0b>] ? kmalloc_nofail+0x2c/0x3e [spl]
[ 2899.906798]  [<ffffffff81059f93>] ? abort_exclusive_wait+0x8f/0x8f
[ 2899.906801]  [<ffffffffa035628d>] zio_buf_alloc+0x22/0x24 [zfs]
[ 2899.906833]  [<ffffffffa032b978>] vdev_raidz_io_start+0x252/0x538 [zfs]
[ 2899.906859]  [<ffffffffa0358ef9>] zio_vdev_io_start+0x211/0x223 [zfs]
[ 2899.906888]  [<ffffffffa0357c8e>] zio_nowait+0xdc/0x103 [zfs]
[ 2899.906917]  [<ffffffffa0329647>] vdev_mirror_io_start+0x30a/0x327 [zfs]
[ 2899.906942]  [<ffffffffa0328fc4>] ? vdev_config_sync+0x134/0x134 [zfs]
[ 2899.906966]  [<ffffffffa031dcc5>] ? spa_config_enter+0xc4/0xde [zfs]
[ 2899.906991]  [<ffffffffa0358d38>] zio_vdev_io_start+0x50/0x223 [zfs]
[ 2899.907020]  [<ffffffffa0355d7d>] zio_execute+0xb8/0xe0 [zfs]
[ 2899.907050]  [<ffffffffa0169587>] taskq_thread+0x2aa/0x4d2 [spl]
[ 2899.907056]  [<ffffffff81063985>] ? finish_task_switch+0x82/0xad
[ 2899.907059]  [<ffffffff81067b74>] ? try_to_wake_up+0x1d4/0x1d4
[ 2899.907061]  [<ffffffffa01692dd>] ? task_done+0x110/0x110 [spl]
[ 2899.907067]  [<ffffffff81059601>] kthread+0x6f/0x77
[ 2899.907069]  [<ffffffff81501904>] kernel_thread_helper+0x4/0x10
[ 2899.907072]  [<ffffffff814fa21d>] ? retint_restore_args+0x13/0x13
[ 2899.907076]  [<ffffffff81059592>] ? kthread_freezable_should_stop+0x43/0x43
[ 2899.907078]  [<ffffffff81501900>] ? gs_change+0x13/0x13

recent slab details (1s apart)

zio_buf_4096                          0x00040 419430400  57413632   262144     4096   1600   598  3366  49600 14017 104346      0     0     0
zio_data_buf_4096                     0x00040 7402684416 3272089600   262144     4096  28239 28077 44967  875409 798850 1393977      0     0     0

zio_buf_4096                          0x00040 419430400  15994880   262144     4096   1600   152  3366  49600  3905 104346      0     0     0
zio_data_buf_4096                     0x00040 7402684416 3061850112   262144     4096  28239 28017 44967  875409 747522 1393977      0     0     0

zio_buf_4096                          0x00040 419430400 129372160   262144     4096   1600  1333  3366  49600 31585 104346      0     0     0
zio_data_buf_4096                     0x00040 7402684416 3090554880   262144     4096  28239 28001 44967  875409 754530 1393977      0     0     0

zio_buf_4096                          0x00040 419430400   3411968   262144     4096   1600    44  3366  49600   833 104346      0     0     0
zio_data_buf_4096                     0x00040 7402684416 3036553216   262144     4096  28239 27999 44967  875409 741346 1393977      0     0     0

zio_buf_4096                          0x00040 419430400   3149824   262144     4096   1600    44  3366  49600   769 104346      0     0     0
zio_data_buf_4096                     0x00040 7402684416 3198296064   262144     4096  28239 27987 44967  875409 780834 1393977      0     0     0

zio_buf_4096                          0x00040 395313152   7737344   262144     4096   1508   360  3366  46748  1889 104346      0     0     0
zio_data_buf_4096                     0x00040 7364935680 3056738304   262144     4096  28095 27931 44967  870945 746274 1393977      0     0     0

zio_buf_4096                          0x00040 395313152   6295552   262144     4096   1508   142  3366  46748  1537 104346      0     0     0
zio_data_buf_4096                     0x00040 7364935680 2998018048   262144     4096  28095 27865 44967  870945 731938 1393977      0     0     0

zio_buf_4096                          0x00040 395313152   4853760   262144     4096   1508   154  3366  46748  1185 104346      0     0     0
zio_data_buf_4096                     0x00040 7364935680 3104579584   262144     4096  28095 27865 44967  870945 757954 1393977      0     0     0

zio_buf_4096                          0x00040 395313152   2625536   262144     4096   1508   119  3366  46748   641 104346      0     0     0
zio_data_buf_4096                     0x00040 7364935680 3196985344   262144     4096  28095 27865 44967  870945 780514 1393977      0     0     0

zio_buf_4096                          0x00040 395313152   3411968   262144     4096   1508   139  3366  46748   833 104346      0     0     0
zio_data_buf_4096                     0x00040 7364935680 3319275520   262144     4096  28095 27865 44967  870945 810370 1393977      0     0     0

zio_buf_4096                          0x00040 395313152   2756608   262144     4096   1508   143  3366  46748   673 104346      0     0     0
zio_data_buf_4096                     0x00040 7364935680 3342082048   262144     4096  28095 27865 44967  870945 815938 1393977      0     0     0

zio_buf_4096                          0x00040 395313152   2887680   262144     4096   1508   104  3366  46748   705 104346      0     0     0
zio_data_buf_4096                     0x00040 7364935680 3359514624   262144     4096  28095 27865 44967  870945 820194 1393977      0     0     0

zio_buf_4096                          0x00040 315621376  57151488   262144     4096   1204   451  3366  37324 13953 104346      0     0     0
zio_data_buf_4096                     0x00040 7431782400 3599507456   262144     4096  28350 28350 44967  878850 878786 1393977      0     0     0

zio_buf_4096                          0x00040 315621376  12062720   262144     4096   1204    95  3366  37324  2945 104346      0     0     0
zio_data_buf_4096                     0x00040 7464812544 3294253056   262144     4096  28476 28337 44967  882756 804261 1393977      0     0     0

zio_buf_4096                          0x00040 315621376  25956352   262144     4096   1204   205  3366  37324  6337 104346      0     0     0
zio_data_buf_4096                     0x00040 7464812544 3328724992   262144     4096  28476 28333 44967  882756 812677 1393977      0     0     0

zio_buf_4096                          0x00040 315621376   5246976   262144     4096   1204    44  3366  37324  1281 104346      0     0     0
zio_data_buf_4096                     0x00040 7464812544 3147714560   262144     4096  28476 28144 44967  882756 768485 1393977      0     0     0

zio_buf_4096                          0x00040 277872640   2887680   262144     4096   1060   122  3366  32860   705 104346      0     0     0
zio_data_buf_4096                     0x00040 7422869504 3233042432   262144     4096  28316 28142 44967  877796 789317 1393977      0     0     0

@behlendorf
Copy link
Contributor

@cwedgwood I was able to reproduce a failure in my Fedora 17 VM with the 3.6.3-1 debug kernel. While different from your previous debug messages it would explain the hard lockup your seeing. The following GPF when hit multiple times in the kernel context would absolutely cause real hardware to hang as you've described.

I suspect the GPF may be caused by the condition variable in the range lock being destroyed and the memory being freed before the various waiters can be notified. We had similar issue with this exact lock some time ago and I thought I resolved them but perhaps there's still a tiny race. This is actually an upstream bug it just happens that the Solaris condition variable implementation isn't particularly sensitive to this kind of misuse typically because the condition variable is embedded in a fairly long lived structure.

No patch yet, but I'll chew on it now that I've found a concrete issue.

general protection fault: 0000 [#1] SMP
CPU 6
Pid: 1388, comm: zvol/14 Tainted: G           O 3.6.3-1.fc17.x86_64.debug #1 Bochs Bochs
RIP: 0010:[]  [] __wake_up_common+0x2b/0x90
RSP: 0018:ffff880051809ab0  EFLAGS: 00010082
RAX: 0000000000000286 RBX: ffff88004c423298 RCX: 0000000000000000
RDX: 6b6b6b6b6b6b6b6b RSI: 0000000000000003 RDI: ffff88004c423298
RBP: ffff880051809af0 R08: 0000000000000000 R09: 0000000000000001
R10: 0000000000000000 R11: 0000000000000000 R12: ffff88004c4232e0
R13: 0000000000000003 R14: 0000000000000000 R15: 0000000000000003
FS:  0000000000000000(0000) GS:ffff88007d200000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007f9e38112000 CR3: 000000001c077000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process zvol/14 (pid: 1388, threadinfo ffff880051808000, task ffff880051800000)
Stack:
 0000000151809af0 0000000000000000 ffffffff810a1172 ffff88004c423298
 0000000000000286 0000000000000003 0000000000000001 0000000000000000
 ffff880051809b30 ffffffff810a1188 ffff880051a8a5b0 ffff88004c423238
Call Trace:
 [] ? __wake_up+0x32/0x70
 [] __wake_up+0x48/0x70
 [] cv_wait_common+0x1b8/0x3d0 [spl]
 [] ? wake_up_bit+0x40/0x40
 [] __cv_wait+0x13/0x20 [spl]
 [] zfs_range_lock+0x4d6/0x620 [zfs]
 [] zvol_get_data+0x89/0x150 [zfs]
 [] zil_commit+0x5a2/0x770 [zfs]
 [] zvol_write+0x1b2/0x480 [zfs]
 [] taskq_thread+0x250/0x820 [spl]
 [] ? finish_task_switch+0x3f/0x120
 [] ? try_to_wake_up+0x340/0x340
 [] ? __taskq_create+0x6e0/0x6e0 [spl]
 [] kthread+0xb7/0xc0
 [] kernel_thread_helper+0x4/0x10
 [] ? retint_restore_args+0x13/0x13
 [] ? __init_kthread_worker+0x70/0x70
 [] ? gs_change+0x13/0x13
Code: 55 48 89 e5 41 57 41 56 41 55 41 54 53 48 83 ec 18 e8 2a 3c 64 00 89 55 c4 48 8b 57 48 4c 8d 67 48 41 89 f7 41 89 ce 4c 89 45 c8 <4c> 8b 2a 48 8d 42 e8 49 83 ed 18 49 39 d4
RIP  [] __wake_up_common+0x2b/0x90
 RSP 
general protection fault: 0000 [#1] SMP
CPU 6
Pid: 1388, comm: zvol/14 Tainted: G           O 3.6.3-1.fc17.x86_64.debug #1 Bochs Bochs
RIP: 0010:[]  [] __wake_up_common+0x2b/0x90
RSP: 0018:ffff880051809ab0  EFLAGS: 00010082
RAX: 0000000000000286 RBX: ffff88004c423298 RCX: 0000000000000000
RDX: 6b6b6b6b6b6b6b6b RSI: 0000000000000003 RDI: ffff88004c423298
RBP: ffff880051809af0 R08: 0000000000000000 R09: 0000000000000001
R10: 0000000000000000 R11: 0000000000000000 R12: ffff88004c4232e0
R13: 0000000000000003 R14: 0000000000000000 R15: 0000000000000003
FS:  0000000000000000(0000) GS:ffff88007d200000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007f9e38112000 CR3: 000000001c077000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process zvol/14 (pid: 1388, threadinfo ffff880051808000, task ffff880051800000)
Stack:
 0000000151809af0 0000000000000000 ffffffff810a1172 ffff88004c423298
 0000000000000286 0000000000000003 0000000000000001 0000000000000000
 ffff880051809b30 ffffffff810a1188 ffff880051a8a5b0 ffff88004c423238
Call Trace:
 [] ? __wake_up+0x32/0x70
 [] __wake_up+0x48/0x70
 [] cv_wait_common+0x1b8/0x3d0 [spl]
 [] ? wake_up_bit+0x40/0x40
 [] __cv_wait+0x13/0x20 [spl]
 [] zfs_range_lock+0x4d6/0x620 [zfs]
 [] zvol_get_data+0x89/0x150 [zfs]
 [] zil_commit+0x5a2/0x770 [zfs]
 [] zvol_write+0x1b2/0x480 [zfs]
 [] taskq_thread+0x250/0x820 [spl]
 [] ? finish_task_switch+0x3f/0x120
 [] ? try_to_wake_up+0x340/0x340
 [] ? __taskq_create+0x6e0/0x6e0 [spl]
 [] kthread+0xb7/0xc0
 [] kernel_thread_helper+0x4/0x10
 [] ? retint_restore_args+0x13/0x13
 [] ? __init_kthread_worker+0x70/0x70
 [] ? gs_change+0x13/0x13
Code: 55 48 89 e5 41 57 41 56 41 55 41 54 53 48 83 ec 18 e8 2a 3c 64 00 89 55 c4 48 8b 57 48 4c 8d 67 48 41 89 f7 41 89 ce 4c 89 45 c8 <4c> 8b 2a 48 8d 42 e8 49 83 ed 18 49 39 d4
RIP  [] __wake_up_common+0x2b/0x90
 RSP 
BUG: scheduling while atomic: zvol/14/1388/0x10000002
INFO: lockdep is turned off. 
Modules linked in: xfs zfs(O) zcommon(O) zunicode(O) znvpair(O) zavl(O) splat(O) spl(O) zlib_deflate lockd sunrpc bnep bluetooth rfkill ip6t_REJECT nf_conntrack_ipv6 nf_conntrack
Pid: 1388, comm: zvol/14 Tainted: G      D    O 3.6.3-1.fc17.x86_64.debug #1
Call Trace:
 [] __schedule_bug+0x67/0x75
 [] __schedule+0x98b/0x9f0
 [] __cond_resched+0x2a/0x40
 [] _cond_resched+0x30/0x40
 [] mutex_lock_nested+0x33/0x390
 [] ? exit_fs+0x47/0xa0
 [] perf_event_exit_task+0x30/0x220
 [] do_exit+0x1d5/0xb00
 [] ? kmsg_dump+0x1b8/0x240
 [] ? kmsg_dump+0x25/0x240
 [] oops_end+0x9d/0xe0
 [] die+0x58/0x90
 [] do_general_protection+0x162/0x170
 [] ? restore_args+0x30/0x30
 [] general_protection+0x25/0x30
 [] ? __wake_up_common+0x2b/0x90
 [] ? __wake_up+0x32/0x70
 [] __wake_up+0x48/0x70
 [] cv_wait_common+0x1b8/0x3d0 [spl]
 [] ? wake_up_bit+0x40/0x40
 [] __cv_wait+0x13/0x20 [spl]
 [] zfs_range_lock+0x4d6/0x620 [zfs]
 [] zvol_get_data+0x89/0x150 [zfs]
 [] zil_commit+0x5a2/0x770 [zfs]
 [] zvol_write+0x1b2/0x480 [zfs]
 [] taskq_thread+0x250/0x820 [spl]
 [] ? finish_task_switch+0x3f/0x120
 [] ? try_to_wake_up+0x340/0x340
 [] ? __taskq_create+0x6e0/0x6e0 [spl]
 [] kthread+0xb7/0xc0
 [] kernel_thread_helper+0x4/0x10
 [] ? retint_restore_args+0x13/0x13
 [] ? __init_kthread_worker+0x70/0x70
 [] ? gs_change+0x13/0x13

@behlendorf
Copy link
Contributor

@cwedgwood Another possibility occurred to me which I think fits all the facts. I'm still going to verify the CVs are 100% solid, but the crash might also be the result of a stack overflow.

In the Linux kernel we only have 8k of stack space which isn't a lot of elbow room. Under Solaris/FreeBSD the default in 24k and ZFS was originally written with that limit in mind. Now I've gone through a lot of effort to bring the ZFS stack usage down, and for all the workloads I'm aware of stack overruns never occur. The only remaining overrun I'm positive about occurs when running ZFS over multipath devices (#675). The biggest consumer is usually the stack needed to recursively traverse an objects block pointers.

However, your zvol+xfs workload may just push things over the edge too. Running xfs on a zvol is probably close to the worst case, you have a single object and your allocating small blocks all over the entire virtual device. Add in to that the usual xfs stack overhead (which historically has been substantial) and you may trash the stack.

We can test if this is the problem by increasing the default linux stack size to 16k. This can be done safely by recompiling the kernel with THREAD_SIZE_ORDER=2. See the top of arch/x86/include/asm/page_64_types.h for x86_64 machines, the THREAD_SIZE_ORDER just needs to be changes from 1 to 2. The only side effect of this change is a little more memory usage and it could slow down your fork()/exec() rate.

diff --git a/arch/x86/include/asm/page_64_types.h b/arch/x86/include/asm/page_64
index 320f7bb..8fcc6cb 100644
--- a/arch/x86/include/asm/page_64_types.h
+++ b/arch/x86/include/asm/page_64_types.h
@@ -1,7 +1,7 @@
 #ifndef _ASM_X86_PAGE_64_DEFS_H
 #define _ASM_X86_PAGE_64_DEFS_H

-#define THREAD_SIZE_ORDER      1
+#define THREAD_SIZE_ORDER      2
 #define THREAD_SIZE  (PAGE_SIZE << THREAD_SIZE_ORDER)
 #define CURRENT_MASK (~(THREAD_SIZE - 1))

@cwedgwood
Copy link
Contributor Author

all recent 'stats stuff, 16K kernel stacks, wedged hard

[ 3482.972444] BUG: soft lockup - CPU#0 stuck for 22s! [z_wr_iss/0:5096]
...
[ 3482.982490] Call Trace:
[ 3482.982492]  [<ffffffffa026821e>] ? zio_create+0x31f/0x331 [zfs]
[ 3482.982543]  [<ffffffffa026728d>] zio_buf_alloc+0x22/0x24 [zfs]
[ 3482.982572]  [<ffffffffa0269dc2>] zio_vdev_io_start+0xda/0x223 [zfs]
[ 3482.982601]  [<ffffffffa0268c8e>] zio_nowait+0xdc/0x103 [zfs]
[ 3482.982631]  [<ffffffffa023a647>] vdev_mirror_io_start+0x30a/0x327 [zfs]
[ 3482.982655]  [<ffffffffa0239fc4>] ? vdev_config_sync+0x134/0x134 [zfs]
[ 3482.982679]  [<ffffffffa022ecc5>] ? spa_config_enter+0xc4/0xde [zfs]
[ 3482.982703]  [<ffffffffa0269d38>] zio_vdev_io_start+0x50/0x223 [zfs]
[ 3482.982732]  [<ffffffffa0266d7d>] zio_execute+0xb8/0xe0 [zfs]
[ 3482.982762]  [<ffffffffa007e587>] taskq_thread+0x2aa/0x4d2 [spl]
[ 3482.982768]  [<ffffffff81063985>] ? finish_task_switch+0x82/0xad
[ 3482.982773]  [<ffffffff81067b74>] ? try_to_wake_up+0x1d4/0x1d4
[ 3482.982776]  [<ffffffffa007e2dd>] ? task_done+0x110/0x110 [spl]
[ 3482.982782]  [<ffffffff81059601>] kthread+0x6f/0x77
[ 3482.982784]  [<ffffffff81501984>] kernel_thread_helper+0x4/0x10
[ 3482.982789]  [<ffffffff814fa29d>] ? retint_restore_args+0x13/0x13
[ 3482.982794]  [<ffffffff81059592>] ? kthread_freezable_should_stop+0x43/0x43
[ 3482.982796]  [<ffffffff81501980>] ? gs_change+0x13/0x13


[ 3482.992428] CPU 2 [ 3482.992430] Pid: 5098, comm: z_wr_iss/2 Tainted: G           O 3.6.6.cw1 #2 Supermicro X8DTL/X8DTL
...
[ 3482.992462] Call Trace:
[ 3482.992464]  [<ffffffff81059f93>] ? abort_exclusive_wait+0x8f/0x8f
[ 3482.992468]  [<ffffffffa026728d>] zio_buf_alloc+0x22/0x24 [zfs]
[ 3482.992500]  [<ffffffffa0269dc2>] zio_vdev_io_start+0xda/0x223 [zfs]
[ 3482.992529]  [<ffffffffa0268c8e>] zio_nowait+0xdc/0x103 [zfs]
[ 3482.992559]  [<ffffffffa023a647>] vdev_mirror_io_start+0x30a/0x327 [zfs]
[ 3482.992583]  [<ffffffffa0239fc4>] ? vdev_config_sync+0x134/0x134 [zfs]
[ 3482.992606]  [<ffffffffa022ecc5>] ? spa_config_enter+0xc4/0xde [zfs]
[ 3482.992631]  [<ffffffffa0269d38>] zio_vdev_io_start+0x50/0x223 [zfs]
[ 3482.992660]  [<ffffffffa0266d7d>] zio_execute+0xb8/0xe0 [zfs]
[ 3482.992689]  [<ffffffffa007e587>] taskq_thread+0x2aa/0x4d2 [spl]
[ 3482.992694]  [<ffffffff8106394b>] ? finish_task_switch+0x48/0xad
[ 3482.992697]  [<ffffffff81067b74>] ? try_to_wake_up+0x1d4/0x1d4
[ 3482.992700]  [<ffffffffa007e2dd>] ? task_done+0x110/0x110 [spl]
[ 3482.992705]  [<ffffffff81059601>] kthread+0x6f/0x77
[ 3482.992708]  [<ffffffff81501984>] kernel_thread_helper+0x4/0x10
[ 3482.992712]  [<ffffffff814fa29d>] ? retint_restore_args+0x13/0x13
[ 3482.992715]  [<ffffffff81059592>] ? kthread_freezable_should_stop+0x43/0x43
[ 3482.992717]  [<ffffffff81501980>] ? gs_change+0x13/0x13


[ 3483.012398] CPU 3 [ 3483.012400] Pid: 5099, comm: z_wr_iss/3 Tainted: G           O 3.6.6.cw1 #2 Supermicro X8DTL/X8DTL
...
[ 3483.012433] Call Trace:
[ 3483.012434]  [<ffffffffa0268c8e>] ? zio_nowait+0xdc/0x103 [zfs]
[ 3483.012465]  [<ffffffff81059f93>] ? abort_exclusive_wait+0x8f/0x8f
[ 3483.012469]  [<ffffffffa026728d>] zio_buf_alloc+0x22/0x24 [zfs]
[ 3483.012498]  [<ffffffffa0267476>] zio_write_bp_init+0x19b/0x3d9 [zfs]
[ 3483.012527]  [<ffffffffa0266d7d>] zio_execute+0xb8/0xe0 [zfs]
[ 3483.012557]  [<ffffffffa007e587>] taskq_thread+0x2aa/0x4d2 [spl]
[ 3483.012563]  [<ffffffff81063985>] ? finish_task_switch+0x82/0xad
[ 3483.012566]  [<ffffffff81067b74>] ? try_to_wake_up+0x1d4/0x1d4
[ 3483.012568]  [<ffffffffa007e2dd>] ? task_done+0x110/0x110 [spl]
[ 3483.012574]  [<ffffffff81059601>] kthread+0x6f/0x77
[ 3483.012576]  [<ffffffff81501984>] kernel_thread_helper+0x4/0x10
[ 3483.012579]  [<ffffffff814fa29d>] ? retint_restore_args+0x13/0x13
[ 3483.012582]  [<ffffffff81059592>] ? kthread_freezable_should_stop+0x43/0x43
[ 3483.012585]  [<ffffffff81501980>] ? gs_change+0x13/0x13
...
...

@cwedgwood
Copy link
Contributor Author

process use when it died

 5540 root      20   0 16988 1100  736 D    80  0.0   0:03.27 cp                                                                                                                                    
  660 root      20   0     0    0    0 S    47  0.0   3:29.34 kswapd1                                                                                                                               
  659 root      20   0     0    0    0 S    33  0.0   4:50.87 kswapd0                                                                                                                               
 5100 root      39  19     0    0    0 R    14  0.0   0:43.89 z_wr_iss/4                                                                                                                            
 5099 root      39  19     0    0    0 R    13  0.0   0:40.80 z_wr_iss/3                                                                                                                            
 5103 root      39  19     0    0    0 R    13  0.0   0:45.24 z_wr_iss/7                                                                                                                            
 5296 root      20   0     0    0    0 D    13  0.0   0:20.85 flush-240:0                                                                                                                           
 5101 root      39  19     0    0    0 R    12  0.0   0:46.35 z_wr_iss/5                                                                                                                            
 5105 root      39  19     0    0    0 R    11  0.0   0:42.47 z_wr_iss/9                                                                                                                            
 5097 root      39  19     0    0    0 R    11  0.0   0:40.20 z_wr_iss/1                                                                                                                            
 5106 root      39  19     0    0    0 R    11  0.0   0:41.30 z_wr_iss/10                                                                                                                           
 4510 root       0 -20     0    0    0 D    10  0.0   0:12.95 arc_adapt                                                                                                                             
 5104 root      39  19     0    0    0 R    10  0.0   0:41.82 z_wr_iss/8                                                                                                                            
 5096 root      39  19     0    0    0 R     9  0.0   0:34.80 z_wr_iss/0                                                                                                                            
 5168 root       0 -20     0    0    0 R     9  0.0   1:48.36 txg_sync                                                                                                                              
 5111 root      39  19     0    0    0 R     9  0.0   0:48.71 z_wr_iss/15                                                                                                                           
 5102 root      39  19     0    0    0 R     8  0.0   0:44.47 z_wr_iss/6                                                                                                                            
 5527 root      20   0     0    0    0 S     8  0.0   0:00.24 kworker/6:2                                                                                                                           
 5109 root      39  19     0    0    0 R     8  0.0   0:48.54 z_wr_iss/13                                                                                                                           
 4517 root      39  19     0    0    0 R     7  0.0   0:21.80 zvol/5                                                                                                                                
 4523 root      39  19     0    0    0 D     7  0.0   0:22.76 zvol/11                                                                                                                               
 5110 root      39  19     0    0    0 R     7  0.0   0:49.87 z_wr_iss/14                                                                                                                           
 4521 root      39  19     0    0    0 D     7  0.0   0:23.62 zvol/9                                                                                                                                
 4514 root      39  19     0    0    0 D     6  0.0   0:21.76 zvol/2                                                                                                                                
 4519 root      39  19     0    0    0 R     6  0.0   0:22.07 zvol/7                                                                                                                                
 5108 root      39  19     0    0    0 R     6  0.0   0:47.55 z_wr_iss/12                                                                                                                           
 5119 root      39  19     0    0    0 R     6  0.0   0:23.89 z_wr_int/2                                                                                                                            
 5121 root      39  19     0    0    0 R     6  0.0   0:24.50 z_wr_int/4                                                                                                                            
 5124 root      39  19     0    0    0 R     6  0.0   0:25.13 z_wr_int/7                                                                                                                            
 5128 root      39  19     0    0    0 S     6  0.0   0:26.45 z_wr_int/11                                                                                                                           
 5132 root      39  19     0    0    0 S     6  0.0   0:28.03 z_wr_int/15                                                                                                                           
 4512 root      39  19     0    0    0 D     6  0.0   0:19.92 zvol/0                                                                                                                                
 4520 root      39  19     0    0    0 R     6  0.0   0:24.60 zvol/8                                                                                                                                
 5107 root      39  19     0    0    0 R     6  0.0   0:42.40 z_wr_iss/11                                                                                                                           
 5098 root      39  19     0    0    0 R     5  0.0   0:40.29 z_wr_iss/2                                                                                                                            
 5130 root      39  19     0    0    0 R     5  0.0   0:28.25 z_wr_int/13                                                                                                                           
 4515 root      39  19     0    0    0 R     5  0.0   0:21.89 zvol/3                                                                                                                                
 4518 root      39  19     0    0    0 R     5  0.0   0:22.06 zvol/6                                                                                                                                
 4522 root      39  19     0    0    0 D     5  0.0   0:23.19 zvol/10                                                                                                                               
 5129 root      39  19     0    0    0 S     5  0.0   0:28.59 z_wr_int/12                                                                                                                           
 5117 root      39  19     0    0    0 R     4  0.0   0:21.84 z_wr_int/0                                                                                                                            
 5122 root      39  19     0    0    0 R     4  0.0   0:25.51 z_wr_int/5                                                                                                                            
 5126 root      39  19     0    0    0 R     4  0.0   0:25.81 z_wr_int/9                                                                                                                            
 5127 root      39  19     0    0    0 S     4  0.0   0:26.74 z_wr_int/10                                                                                                                           
 5118 root      39  19     0    0    0 R     4  0.0   0:23.82 z_wr_int/1                                                                                                                            
 5125 root      39  19     0    0    0 R     4  0.0   0:27.34 z_wr_int/8                                                                                                                            
 4513 root      39  19     0    0    0 R     3  0.0   0:23.42 zvol/1                                                                                                                                
 1100 root      20   0     0    0    0 S     3  0.0   0:12.34 kworker/11:1                                                                                                                          
 4516 root      39  19     0    0    0 R     3  0.0   0:22.11 zvol/4                                                                                                                                
 5120 root      39  19     0    0    0 R     3  0.0   0:23.90 z_wr_int/3                                                                                                                            
 5123 root      39  19     0    0    0 R     3  0.0   0:25.13 z_wr_int/6                                                                                                                            
 1063 root      20   0     0    0    0 R     3  0.0   0:15.97 kworker/2:1                                                                                                                           
 5131 root      39  19     0    0    0 R     3  0.0   0:28.39 z_wr_int/14                                                                                                                           
 5396 root      20   0     0    0    0 S     3  0.0   0:08.12 kworker/9:8                                                                                                                           
 5415 root      20   0     0    0    0 S     3  0.0   0:10.79 kworker/3:11                                                                                                                          
 5536 root      20   0     0    0    0 R     3  0.0   0:01.82 kworker/5:8                                                                                                                           

behlendorf added a commit to openzfs/spl that referenced this issue Nov 9, 2012
This branch contains kmem cache optimizations designed to resolve
the lockups reported in openzfs/zfs#922.  The lockups were
largely the result of spin lock contention in the slab under low
memory conditions.  Fundamentally, these changes are all designed
to minimize that contention though a variety of methods.

  * Improved vmem cached deadlock detection
  * Track emergency objects in rbtree
  * Optimize spl_kmem_cache_free()
  * Never spin in kmem_cache_alloc()

Signed-off-by: Brian Behlendorf <[email protected]>
openzfs/zfs#922
@behlendorf
Copy link
Contributor

The kmem cache improvements which prevent the system from thrashing and hanging have been merged in to master. For those impacted by this issue if you could test the latest code it would be appreciated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Memory Management kernel memory management Component: ZVOL ZFS Volumes Type: Documentation Indicates a requested change to the documentation
Projects
None yet
Development

No branches or pull requests

6 participants