-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hang under memory pressure #2308
Comments
The same problem happened with stable version 0.6.2-r5 (in gentoo repo). [ 383.521769] INFO: task systemd:1 blocked for more than 20 seconds. |
I had a similar issue, running ZFS latest stable (0.6.3 g07dabd2) on Arch, 8GB ECC memory, but running a minecraft server which uses up to 2G.
The machine eventually became responsive again and I rebooted it. |
It's hard to say for certain but it's likely this is a duplicate of #2523. |
Getting this systematically now. Seems to require that the system is switched on for about 24 hours and has been used for copying a moderate amount of data (e.g. 10+GBytes).
Rebooting fixes the issue for a while.
"Plenty" of free memory:
|
Some more details:
|
Can you dump the rest of the threads using sysrq-t when this happens again. One of the z_* threads should be handling some part of that IO and it would be useful to see what it's blocked on. |
I'm fairly certain I'm running into this issue. My ZFS only has 2GB of ram but loads the module with zfs_arc_max=1610612736 . Under heavy IO every 10-30 mins the entire pool locks for a minute or two. But afterwords it goes back to working. Ubuntu 14.04 Right before it unlocks dmesg gets filled with this: https://gist.github.com/LordQuackstar/a05c1a0b4a166b3eb2fe sysreq-t output in dmesg: https://gist.github.com/LordQuackstar/2e3517c5a3df0d755e40 |
I had this happen again, but the machine eventually recovered. Just updated to the latest arch release 0.6.3, will report back again. @behlendorf I'll try to catch it and give you the details you asked for. |
Here is the dmesg: https://gist.github.com/ioquatix/3436e16a83ea867a5885 |
@ioquatix Would you provide a description of your hardware and pool configuration? |
|
|
@ioquatix I suspect that the kmem rework in openzfs/spl#369 and #2411 would have an effect here. I need to refresh that pull request as per @behlendorf's request. I will try to do that later today. Those requests resolve the atime issue that I described in my blog post as well as some other problems. It is conceivable that you are hitting one of them. |
@ryao Once the update is available let me know and I will try it. |
Closing as stale. |
I'm almost completely convinced that the problems I was experience was due to faulty hardware. |
Hang (sometimes complete, sometimes recoverable with SysRq + E) on a computer with 4G phys memory under heavy memory pressure.
3.14.2-aufs #4 SMP PREEMPT
spl: 0.6.2-36_g703371d
zfs: 0.6.2-274_g2c33b91
Following kernel params set:
Hang detection timeout reduced to 20 sec (for debug)
CONFIG_TRANSPARENT_HUGEPAGE is not set
vm.dirty_ratio = 10
What else can be done to reduce chance to get this condition?
[ 2341.763203] INFO: task systemd:1 blocked for more than 20 seconds.
[ 2341.763207] Tainted: P W O 3.14.2-aufs #4
[ 2341.763208] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2341.763209] systemd D ffff88012fb93280 4072 1 0 0x00000000
[ 2341.763213] ffff880129ac3628 0000000000000002 ffff880129ae0000 ffff880129ac3fd8
[ 2341.763215] 0000000000013280 0000000000013280 ffff880129ae0000 ffff8800c6d1b890
[ 2341.763217] ffff8800c6d1b890 ffff880129ae0000 ffff8800c6d1b898 0000000000000246
[ 2341.763220] Call Trace:
[ 2341.763226] [] schedule_preempt_disabled+0x2e/0x80
[ 2341.763229] [] __mutex_lock_slowpath+0x12f/0x3a0
[ 2341.763231] [] mutex_lock+0x23/0x33
[ 2341.763251] [] vdev_queue_io+0x85/0x180 [zfs]
[ 2341.763262] [] zio_buf_free+0xc07/0x17b0 [zfs]
[ 2341.763273] [] zio_nowait+0xb6/0x180 [zfs]
[ 2341.763284] [] vdev_config_sync+0xaff/0xd80 [zfs]
[ 2341.763295] [] ? vdev_config_sync+0x140/0xd80 [zfs]
[ 2341.763305] [] zio_buf_free+0xc57/0x17b0 [zfs]
[ 2341.763315] [] zio_nowait+0xb6/0x180 [zfs]
[ 2341.763321] [] arc_read+0x31a/0x960 [zfs]
[ 2341.763327] [] dbuf_read+0x254/0xd00 [zfs]
[ 2341.763334] [] dbuf_read+0xc7a/0xd00 [zfs]
[ 2341.763340] [] dbuf_hold_impl+0x76/0xa0 [zfs]
[ 2341.763346] [] dbuf_hold+0x1b/0x30 [zfs]
[ 2341.763353] [] dmu_buf_hold+0x29d/0x6b0 [zfs]
[ 2341.763360] [] dmu_read+0x97/0x2d0 [zfs]
[ 2341.763362] [] ? mutex_unlock+0x9/0x10
[ 2341.763373] [] zfs_getpage+0x12d/0x200 [zfs]
[ 2341.763383] [] ? zpl_putpage+0x220/0x830 [zfs]
[ 2341.763392] [] zpl_putpage+0x247/0x830 [zfs]
[ 2341.763395] [] read_cache_pages+0xba/0x120
[ 2341.763404] [] zpl_putpage+0x99/0x830 [zfs]
[ 2341.763406] [] __do_page_cache_readahead+0x1d4/0x290
[ 2341.763408] [] ra_submit+0x1c/0x20
[ 2341.763410] [] filemap_fault+0x385/0x420
[ 2341.763412] [] __do_fault+0x6e/0x530
[ 2341.763414] [] handle_mm_fault+0x1c3/0xcb0
[ 2341.763416] [] ? _raw_spin_unlock_irqrestore+0x19/0x40
[ 2341.763419] [] ? timerfd_poll+0x50/0x60
[ 2341.763421] [] ? ep_send_events_proc+0x9f/0x1c0
[ 2341.763423] [] __do_page_fault+0x16c/0x580
[ 2341.763426] [] ? acct_account_cputime+0x17/0x20
[ 2341.763428] [] ? account_user_time+0x87/0x90
[ 2341.763430] [] ? _raw_spin_unlock+0x13/0x30
[ 2341.763432] [] ? vtime_account_user+0x4f/0x60
[ 2341.763434] [] do_page_fault+0x1e/0x70
[ 2341.763436] [] page_fault+0x22/0x30
[ 2341.763441] INFO: task kswapd0:62 blocked for more than 20 seconds.
[ 2341.763442] Tainted: P W O 3.14.2-aufs #4
[ 2341.763443] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2341.763444] kswapd0 D ffff88012664be00 3464 62 2 0x00000000
[ 2341.763446] ffff880124fad968 0000000000000002 ffff880129390000 ffff880124fadfd8
[ 2341.763448] 0000000000013280 0000000000013280 ffff880129390000 ffff8800c6a4e428
[ 2341.763450] ffff8800c6a4e268 ffff8800c6a4e430 0000000000000000 ffff8800c6a4e3c0
[ 2341.763452] Call Trace:
[ 2341.763454] [] schedule+0x24/0x70
[ 2341.763459] [] __cv_destroy+0x19d/0x1d0 [spl]
[ 2341.763462] [] ? prepare_to_wait_event+0xf0/0xf0
[ 2341.763466] [] __cv_wait+0x10/0x20 [spl]
[ 2341.763476] [] txg_wait_open+0x83/0xb0 [zfs]
[ 2341.763485] [] dmu_tx_wait+0x305/0x310 [zfs]
[ 2341.763487] [] ? mutex_unlock+0x9/0x10
[ 2341.763501] [] dmu_tx_assign+0x95/0xc90 [zfs]
[ 2341.763520] [] zfs_inactive+0x15b/0x210 [zfs]
[ 2341.763530] [] zpl_vap_init+0x64f/0x7b0 [zfs]
[ 2341.763532] [] evict+0xab/0x1a0
[ 2341.763534] [] dispose_list+0x31/0x40
[ 2341.763535] [] prune_icache_sb+0x42/0x50
[ 2341.763538] [] super_cache_scan+0x100/0x170
[ 2341.763540] [] shrink_slab_node+0x14b/0x2f0
[ 2341.763543] [] ? css_next_descendant_pre+0x1f/0x60
[ 2341.763545] [] shrink_slab+0x86/0x180
[ 2341.763547] [] kswapd_shrink_zone+0x125/0x1c0
[ 2341.763550] [] kswapd+0x4c6/0x890
[ 2341.763552] [] ? mem_cgroup_shrink_node_zone+0x180/0x180
[ 2341.763555] [] kthread+0xdf/0x100
[ 2341.763557] [] ? arch_vtime_task_switch+0x8e/0xa0
[ 2341.763559] [] ? kthread_create_on_node+0x1a0/0x1a0
[ 2341.763561] [] ret_from_fork+0x7c/0xb0
[ 2341.763563] [] ? kthread_create_on_node+0x1a0/0x1a0
[ 2341.763571] INFO: task z_rd_int/0:691 blocked for more than 20 seconds.
[ 2341.763572] Tainted: P W O 3.14.2-aufs #4
[ 2341.763573] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2341.763574] z_rd_int/0 D ffff88012fa13280 5304 691 2 0x00000000
[ 2341.763576] ffff880035d25cd8 0000000000000002 ffff8800c651c6f0 ffff880035d25fd8
[ 2341.763578] 0000000000013280 0000000000013280 ffff8800c651c6f0 ffff8800c6d1b890
[ 2341.763580] ffff8800c6d1b890 ffff8800c651c6f0 ffff8800c6d1b898 0000000000000246
[ 2341.763582] Call Trace:
[ 2341.763584] [] schedule_preempt_disabled+0x2e/0x80
[ 2341.763586] [] __mutex_lock_slowpath+0x12f/0x3a0
[ 2341.763588] [] mutex_lock+0x23/0x33
[ 2341.763599] [] vdev_queue_io_done+0x46/0x36d0 [zfs]
[ 2341.763601] [] ? mutex_unlock+0x9/0x10
[ 2341.763611] [] zio_buf_free+0x960/0x17b0 [zfs]
[ 2341.763620] [] zio_execute+0xa6/0x140 [zfs]
[ 2341.763624] [] taskq_cancel_id+0x2e8/0x490 [spl]
[ 2341.763626] [] ? wake_up_state+0x10/0x10
[ 2341.763629] [] ? taskq_cancel_id+0x130/0x490 [spl]
[ 2341.763631] [] kthread+0xdf/0x100
[ 2341.763633] [] ? arch_vtime_task_switch+0x8e/0xa0
[ 2341.763635] [] ? kthread_create_on_node+0x1a0/0x1a0
[ 2341.763637] [] ret_from_fork+0x7c/0xb0
[ 2341.763638] [] ? kthread_create_on_node+0x1a0/0x1a0
[ 2341.763640] INFO: task z_rd_int/1:692 blocked for more than 20 seconds.
The text was updated successfully, but these errors were encountered: