Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Random hangs/kernel panics #842

Closed
alexclear opened this issue Jul 19, 2012 · 17 comments
Closed

Random hangs/kernel panics #842

alexclear opened this issue Jul 19, 2012 · 17 comments
Milestone

Comments

@alexclear
Copy link

zfs/spl 0.6.0-rc9 on kernel version 2.6.32-042stab055.16-el6-openvz, 16G of RAM, a mirror:

  pool: home
 state: ONLINE
 scan: none requested
config:

    NAME                                       STATE     READ WRITE CKSUM
    home                                       ONLINE       0     0     0
      mirror-0                                 ONLINE       0     0     0
        scsi-SATA_ST33000651AS_Z290Q85N-part4  ONLINE       0     0     0
        scsi-SATA_ST33000651AS_Z290Y917-part4  ONLINE       0     0     0

errors: No known data errors

The system runs a production site under moderate load, we started to get about one hang/reboot per day recently.
I can't get any correlation between hangs and load/external events, they seem to be random.
I was able to get stacktraces today:

Jul 20 01:23:06 heloderma kernel: [33222.763007] INFO: task bdi-default:49 blocked for more than 120 seconds.
Jul 20 01:23:06 heloderma kernel: [33222.764385] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 20 01:23:06 heloderma kernel: [33222.765774] bdi-default D ffff88043e3f7210 0 49 2 0 0x00000000
Jul 20 01:23:06 heloderma kernel: [33222.767091] ffff88043e3f9890 0000000000000046 ffffffffa050b802 ffff88043e3f9860
Jul 20 01:23:06 heloderma kernel: [33222.768336] 0000000000000020 ffff88014c1d52f0 ffff8803a9110200 0000000000000000
Jul 20 01:23:06 heloderma kernel: [33222.769325] ffff8804000003b1 ffff88043e3f77c8 ffff88043e3f9fd8 ffff88043e3f9fd8
Jul 20 01:23:06 heloderma kernel: [33222.770292] Call Trace:
Jul 20 01:23:06 heloderma kernel: [33222.771258] [] ? prepare_to_wait_exclusive+0x4e/0x80
Jul 20 01:23:06 heloderma kernel: [33222.772235] [] cv_wait_common+0x9c/0x1a0 [spl]
Jul 20 01:23:06 heloderma kernel: [33222.773224] [] ? dsl_dir_tempreserve_space+0xe2/0x1f0 [zfs]
Jul 20 01:23:06 heloderma kernel: [33222.774189] [] ? autoremove_wake_function+0x0/0x40
Jul 20 01:23:06 heloderma kernel: [33222.775153] [] __cv_wait+0x13/0x20 [spl]
Jul 20 01:23:06 heloderma kernel: [33222.776127] [] txg_wait_open+0x8b/0x110 [zfs]
Jul 20 01:23:07 heloderma kernel: [33222.776150] [] dmu_tx_wait+0xed/0xf0 [zfs]
Jul 20 01:23:07 heloderma kernel: [33222.776175] [] zfs_putpage+0x24b/0x260 [zfs]
Jul 20 01:23:07 heloderma kernel: [33222.776199] [] zpl_putpage+0x35/0x60 [zfs]
Jul 20 01:23:07 heloderma kernel: [33222.776202] [] write_cache_pages+0x1cb/0x480
Jul 20 01:23:07 heloderma kernel: [33222.776224] [] ? zpl_readpages+0x1f/0x20 [zfs]
Jul 20 01:23:07 heloderma kernel: [33222.776246] [] ? zpl_putpage+0x0/0x60 [zfs]
Jul 20 01:23:07 heloderma kernel: [33222.776249] [] ? apic_timer_interrupt+0xe/0x20
Jul 20 01:23:07 heloderma kernel: [33222.776251] [] ? apic_timer_interrupt+0xe/0x20
Jul 20 01:23:07 heloderma kernel: [33222.776272] [] zpl_writepages+0x18/0x20 [zfs]
Jul 20 01:23:07 heloderma kernel: [33222.776275] [] do_writepages+0x21/0x40
Jul 20 01:23:07 heloderma kernel: [33222.776279] [] __writeback_single_inode+0xdd/0x2c0
Jul 20 01:23:07 heloderma kernel: [33222.776282] [] writeback_single_inode+0x83/0xc0
Jul 20 01:23:07 heloderma kernel: [33222.776285] [] ? iput+0x30/0x70
Jul 20 01:23:07 heloderma kernel: [33222.776287] [] writeback_sb_inodes+0xf1/0x210
Jul 20 01:23:07 heloderma kernel: [33222.776290] [] writeback_inodes_wb+0x150/0x1a0
Jul 20 01:23:07 heloderma kernel: [33222.776293] [] wb_writeback+0x27b/0x420
Jul 20 01:23:07 heloderma kernel: [33222.776296] [] wb_do_writeback+0xbf/0x250
Jul 20 01:23:07 heloderma kernel: [33222.776299] [] bdi_forker_task+0x6a/0x300
Jul 20 01:23:07 heloderma kernel: [33222.776302] [] ? bdi_forker_task+0x0/0x300
Jul 20 01:23:07 heloderma kernel: [33222.776305] [] kthread+0x96/0xa0
Jul 20 01:23:07 heloderma kernel: [33222.776307] [] child_rip+0xa/0x20
Jul 20 01:23:07 heloderma kernel: [33222.776310] [] ? kthread+0x0/0xa0
Jul 20 01:23:07 heloderma kernel: [33222.776312] [] ? child_rip+0x0/0x20
Jul 20 01:23:07 heloderma kernel: [33222.776318] INFO: task kswapd0:100 blocked for more than 120 seconds.
Jul 20 01:23:07 heloderma kernel: [33222.776320] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 20 01:23:07 heloderma kernel: [33222.776321] kswapd0 D ffff88043a10ae50 0 100 2 0 0x00000000
Jul 20 01:23:07 heloderma kernel: [33222.776324] ffff88043a151640 0000000000000046 00000007a050b802 0000000000000001
Jul 20 01:23:07 heloderma kernel: [33222.776327] 0000000000000020 0000000000000086 ffff8800b739cc80 ffff8803c7e03a78
Jul 20 01:23:07 heloderma kernel: [33222.776330] 0000000000000001 ffff88043a10b408 ffff88043a151fd8 ffff88043a151fd8
Jul 20 01:23:07 heloderma kernel: [33222.776333] Call Trace:
Jul 20 01:23:07 heloderma kernel: [33222.776335] [] ? prepare_to_wait_exclusive+0x4e/0x80
Jul 20 01:23:07 heloderma kernel: [33222.776345] [] cv_wait_common+0x9c/0x1a0 [spl]
Jul 20 01:23:07 heloderma kernel: [33222.776348] [] ? autoremove_wake_function+0x0/0x40
Jul 20 01:23:07 heloderma kernel: [33222.776357] [] __cv_wait+0x13/0x20 [spl]
Jul 20 01:23:07 heloderma kernel: [33222.776380] [] txg_wait_open+0x8b/0x110 [zfs]
Jul 20 01:23:07 heloderma kernel: [33222.776401] [] dmu_tx_wait+0xed/0xf0 [zfs]
Jul 20 01:23:07 heloderma kernel: [33222.776423] [] zfs_putpage+0x24b/0x260 [zfs]
Jul 20 01:23:07 heloderma kernel: [33222.776445] [] zpl_putpage+0x56/0x60 [zfs]
Jul 20 01:23:07 heloderma kernel: [33222.776466] [] zpl_writepage+0x12/0x20 [zfs]
Jul 20 01:23:07 heloderma kernel: [33222.776470] [] pageout+0x14d/0x390
Jul 20 01:23:07 heloderma kernel: [33222.776472] [] shrink_page_list+0x6c4/0x9e0
Jul 20 01:23:07 heloderma kernel: [33222.776475] [] shrink_inactive_list+0x3c7/0xb30
Jul 20 01:23:07 heloderma kernel: [33222.776477] [] ? move_active_pages_to_lru+0x215/0x2b0
Jul 20 01:23:07 heloderma kernel: [33222.776481] [] ? ktime_get+0x63/0xe0
Jul 20 01:23:07 heloderma kernel: [33222.776483] [] ? shrink_active_list+0x3e3/0x4a0
Jul 20 01:23:07 heloderma kernel: [33222.776486] [] shrink_zone+0x5d8/0x9d0
Jul 20 01:23:07 heloderma kernel: [33222.776489] [] ? zone_watermark_ok_safe+0xad/0xc0
Jul 20 01:23:07 heloderma kernel: [33222.776491] [] balance_pgdat+0x739/0x820
Jul 20 01:23:07 heloderma kernel: [33222.776493] [] ? isolate_pages_global+0x0/0x520
Jul 20 01:23:07 heloderma kernel: [33222.776496] [] ? zone_watermark_ok_safe+0xad/0xc0
Jul 20 01:23:07 heloderma kernel: [33222.776498] [] kswapd+0x131/0x3a0
Jul 20 01:23:07 heloderma kernel: [33222.776501] [] ? autoremove_wake_function+0x0/0x40
Jul 20 01:23:07 heloderma kernel: [33222.776503] [] ? kswapd+0x0/0x3a0
Jul 20 01:23:07 heloderma kernel: [33222.776506] [] kthread+0x96/0xa0
Jul 20 01:23:07 heloderma kernel: [33222.776509] [] child_rip+0xa/0x20
Jul 20 01:23:07 heloderma kernel: [33222.776512] [] ? kthread+0x0/0xa0
Jul 20 01:23:07 heloderma kernel: [33222.776514] [] ? child_rip+0x0/0x20
Jul 20 01:23:07 heloderma kernel: [33222.776528] INFO: task txg_quiesce:1278 blocked for more than 120 seconds.
Jul 20 01:23:07 heloderma kernel: [33222.776529] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 20 01:23:07 heloderma kernel: [33222.776531] txg_quiesce D ffff88042c6946c0 0 1278 2 0 0x00000000
Jul 20 01:23:07 heloderma kernel: [33222.776533] ffff880408fe7d60 0000000000000046 ffff880408fe7d28 ffff880408fe7d30
Jul 20 01:23:07 heloderma kernel: [33222.776536] ffff880408fe7d30 ffffffff814eab80 ffff88043f808100 ffff88002c31e240
Jul 20 01:23:07 heloderma kernel: [33222.776539] 000000000000af5c ffff88042c694c78 ffff880408fe7fd8 ffff880408fe7fd8
Jul 20 01:23:07 heloderma kernel: [33222.776541] Call Trace:
Jul 20 01:23:07 heloderma kernel: [33222.776544] [] ? __mutex_lock_slowpath+0x70/0x180
Jul 20 01:23:07 heloderma kernel: [33222.776554] [] cv_wait_common+0x9c/0x1a0 [spl]
Jul 20 01:23:07 heloderma kernel: [33222.776557] [] ? autoremove_wake_function+0x0/0x40
Jul 20 01:23:07 heloderma kernel: [33222.776561] [] ? __bitmap_weight+0x8c/0xb0
Jul 20 01:23:07 heloderma kernel: [33222.776570] [] __cv_wait+0x13/0x20 [spl]
Jul 20 01:23:07 heloderma kernel: [33222.776593] [] txg_quiesce_thread+0x1fb/0x340 [zfs]
Jul 20 01:23:07 heloderma kernel: [33222.776598] [] ? set_user_nice+0xc9/0x130
Jul 20 01:23:07 heloderma kernel: [33222.776620] [] ? txg_quiesce_thread+0x0/0x340 [zfs]
Jul 20 01:23:07 heloderma kernel: [33222.776629] [] thread_generic_wrapper+0x68/0x80 [spl]
Jul 20 01:23:07 heloderma kernel: [33222.776637] [] ? thread_generic_wrapper+0x0/0x80 [spl]
Jul 20 01:23:07 heloderma kernel: [33222.776640] [] kthread+0x96/0xa0
Jul 20 01:23:07 heloderma kernel: [33222.776642] [] child_rip+0xa/0x20
Jul 20 01:23:07 heloderma kernel: [33222.776645] [] ? kthread+0x0/0xa0
Jul 20 01:23:07 heloderma kernel: [33222.776647] [] ? child_rip+0x0/0x20
Jul 20 01:23:07 heloderma kernel: [33222.776650] INFO: task zabbix_agentd:2608 blocked for more than 120 seconds.
Jul 20 01:23:07 heloderma kernel: [33222.776652] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 20 01:23:07 heloderma kernel: [33222.776653] zabbix_agentd D ffff88043a80b010 0 2608 2600 0 0x00000000
Jul 20 01:23:07 heloderma kernel: [33222.776656] ffff88043bcf5158 0000000000000086 0000000000000000 0000000000000001
Jul 20 01:23:07 heloderma kernel: [33222.776659] 0000000000000020 0000000000000082 ffff880117dde140 ffff880408fe7d70
Jul 20 01:23:07 heloderma kernel: [33222.776661] 0000000000000001 ffff88043a80b5c8 ffff88043bcf5fd8 ffff88043bcf5fd8
Jul 20 01:23:07 heloderma kernel: [33222.776664] Call Trace:
Jul 20 01:23:07 heloderma kernel: [33222.776673] [] cv_wait_common+0x9c/0x1a0 [spl]
Jul 20 01:23:07 heloderma kernel: [33222.776676] [] ? autoremove_wake_function+0x0/0x40
Jul 20 01:23:07 heloderma kernel: [33222.776684] [] __cv_wait+0x13/0x20 [spl]
Jul 20 01:23:07 heloderma kernel: [33222.776707] [] txg_wait_open+0x8b/0x110 [zfs]
Jul 20 01:23:07 heloderma kernel: [33222.776728] [] dmu_tx_wait+0xed/0xf0 [zfs]
Jul 20 01:23:07 heloderma kernel: [33222.776750] [] zfs_putpage+0x24b/0x260 [zfs]
Jul 20 01:23:07 heloderma kernel: [33222.776772] [] zpl_putpage+0x56/0x60 [zfs]
Jul 20 01:23:07 heloderma kernel: [33222.776793] [] zpl_writepage+0x12/0x20 [zfs]
Jul 20 01:23:07 heloderma kernel: [33222.776796] [] pageout+0x14d/0x390
Jul 20 01:23:07 heloderma kernel: [33222.776798] [] shrink_page_list+0x6c4/0x9e0
Jul 20 01:23:07 heloderma kernel: [33222.776803] [] shrink_inactive_list+0x3c7/0xb30
Jul 20 01:23:07 heloderma kernel: [33222.776805] [] ? move_active_pages_to_lru+0x215/0x2b0
Jul 20 01:23:07 heloderma kernel: [33222.776808] [] ? shrink_active_list+0x3e3/0x4a0
Jul 20 01:23:07 heloderma kernel: [33222.776812] [] shrink_zone+0x5d8/0x9d0
Jul 20 01:23:07 heloderma kernel: [33222.776814] [] ? ktime_get_ts+0xa9/0xe0
Jul 20 01:23:07 heloderma kernel: [33222.776817] [] do_try_to_free_pages+0x2a0/0x7f0
Jul 20 01:23:07 heloderma kernel: [33222.776819] [] try_to_free_pages+0xa0/0x130
Jul 20 01:23:07 heloderma kernel: [33222.776821] [] ? isolate_pages_global+0x0/0x520
Jul 20 01:23:07 heloderma kernel: [33222.776824] [] __alloc_pages_nodemask+0x5f0/0xb50
Jul 20 01:23:07 heloderma kernel: [33222.776829] [] alloc_pages_vma+0x9a/0x150
Jul 20 01:23:07 heloderma kernel: [33222.776831] [] handle_pte_fault+0xa87/0xf60
Jul 20 01:23:07 heloderma kernel: [33222.776835] [] ? bit_waitqueue+0x17/0xc0
Jul 20 01:23:07 heloderma kernel: [33222.776838] [] ? inotify_d_instantiate+0x2a/0x60
Jul 20 01:23:07 heloderma kernel: [33222.776841] [] handle_mm_fault+0x1e4/0x2b0
Jul 20 01:23:07 heloderma kernel: [33222.776844] [] __do_page_fault+0x139/0x490
Jul 20 01:23:07 heloderma kernel: [33222.776847] [] ? seq_printf+0x58/0x90
Jul 20 01:23:07 heloderma kernel: [33222.776850] [] ? show_stat+0x631/0x640
Jul 20 01:23:07 heloderma kernel: [33222.776854] [] do_page_fault+0x3e/0xa0
Jul 20 01:23:07 heloderma kernel: [33222.776856] [] page_fault+0x25/0x30
Jul 20 01:23:07 heloderma kernel: [33222.776862] [] ? copy_user_generic_string+0x2d/0x40
Jul 20 01:23:07 heloderma kernel: [33222.776864] [] ? seq_read+0x2ae/0x400
Jul 20 01:23:07 heloderma kernel: [33222.776868] [] proc_reg_read+0x7e/0xc0
Jul 20 01:23:07 heloderma kernel: [33222.776871] [] vfs_read+0xb5/0x1a0
Jul 20 01:23:07 heloderma kernel: [33222.776873] [] sys_read+0x51/0x90
Jul 20 01:23:07 heloderma kernel: [33222.776876] [] system_call_fastpath+0x16/0x1b
Jul 20 01:23:07 heloderma kernel: [33222.776881] INFO: task nginx:2846 blocked for more than 120 seconds.
Jul 20 01:23:07 heloderma kernel: [33222.776882] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 20 01:23:07 heloderma kernel: [33222.776884] nginx D ffff88043bfe2500 0 2846 2845 0 0x00000000
Jul 20 01:23:07 heloderma kernel: [33222.776886] ffff880409529128 0000000000000082 0000000000000000 0000000000000001
Jul 20 01:23:07 heloderma kernel: [33222.776889] 0000000000000020 0000000000000082 ffff8801985a3540 ffff880408fe7d70
Jul 20 01:23:07 heloderma kernel: [33222.776892] 0000000000000001 ffff88043bfe2ab8 ffff880409529fd8 ffff880409529fd8
Jul 20 01:23:07 heloderma kernel: [33222.776894] Call Trace:
Jul 20 01:23:07 heloderma kernel: [33222.776903] [] cv_wait_common+0x9c/0x1a0 [spl]
Jul 20 01:23:07 heloderma kernel: [33222.776906] [] ? autoremove_wake_function+0x0/0x40
Jul 20 01:23:07 heloderma kernel: [33222.776914] [] __cv_wait+0x13/0x20 [spl]
Jul 20 01:23:07 heloderma kernel: [33222.776937] [] txg_wait_open+0x8b/0x110 [zfs]
Jul 20 01:23:07 heloderma kernel: [33222.776958] [] dmu_tx_wait+0xed/0xf0 [zfs]
Jul 20 01:23:07 heloderma kernel: [33222.776980] [] zfs_putpage+0x24b/0x260 [zfs]
Jul 20 01:23:07 heloderma kernel: [33222.777004] [] zpl_putpage+0x56/0x60 [zfs]
Jul 20 01:23:07 heloderma kernel: [33222.777025] [] zpl_writepage+0x12/0x20 [zfs]
Jul 20 01:23:07 heloderma kernel: [33222.777028] [] pageout+0x14d/0x390
Jul 20 01:23:07 heloderma kernel: [33222.777031] [] shrink_page_list+0x6c4/0x9e0
Jul 20 01:23:07 heloderma kernel: [33222.777033] [] shrink_inactive_list+0x3c7/0xb30
Jul 20 01:23:07 heloderma kernel: [33222.777035] [] ? move_active_pages_to_lru+0x215/0x2b0
Jul 20 01:23:07 heloderma kernel: [33222.777038] [] ? shrink_active_list+0x3e3/0x4a0
Jul 20 01:23:07 heloderma kernel: [33222.777040] [] shrink_zone+0x5d8/0x9d0
Jul 20 01:23:07 heloderma kernel: [33222.777043] [] ? ktime_get_ts+0xa9/0xe0
Jul 20 01:23:07 heloderma kernel: [33222.777045] [] do_try_to_free_pages+0x2a0/0x7f0
Jul 20 01:23:07 heloderma kernel: [33222.777052] [] ? br_dev_xmit+0xb8/0x130 [bridge]
Jul 20 01:23:07 heloderma kernel: [33222.777055] [] try_to_free_pages+0xa0/0x130
Jul 20 01:23:07 heloderma kernel: [33222.777057] [] ? isolate_pages_global+0x0/0x520
Jul 20 01:23:07 heloderma kernel: [33222.777060] [] __alloc_pages_nodemask+0x5f0/0xb50
Jul 20 01:23:07 heloderma kernel: [33222.777063] [] alloc_pages_current+0xaa/0x120
Jul 20 01:23:07 heloderma kernel: [33222.777067] [] default_file_splice_read+0x129/0x310
Jul 20 01:23:07 heloderma kernel: [33222.777070] [] ? _spin_unlock_bh+0x1b/0x20
Jul 20 01:23:07 heloderma kernel: [33222.777074] [] ? release_sock+0xce/0xe0
Jul 20 01:23:07 heloderma kernel: [33222.777078] [] ? tcp_sendmsg+0x70e/0xb80
Jul 20 01:23:07 heloderma kernel: [33222.777081] [] ? sock_aio_write+0x139/0x150
Jul 20 01:23:07 heloderma kernel: [33222.777085] [] ? sock_aio_write+0x0/0x150
Jul 20 01:23:07 heloderma kernel: [33222.777088] [] ? do_sync_readv_writev+0xfb/0x140
Jul 20 01:23:07 heloderma kernel: [33222.777109] [] ? zpl_open+0x71/0x90 [zfs]
Jul 20 01:23:07 heloderma kernel: [33222.777112] [] ? spd_release_page+0x0/0x20
Jul 20 01:23:07 heloderma kernel: [33222.777114] [] do_splice_to+0x6b/0xa0
Jul 20 01:23:07 heloderma kernel: [33222.777117] [] splice_direct_to_actor+0xaf/0x1c0
Jul 20 01:23:07 heloderma kernel: [33222.777120] [] ? direct_splice_actor+0x0/0x30
Jul 20 01:23:07 heloderma kernel: [33222.777123] [] do_splice_direct+0x4d/0x60
Jul 20 01:23:07 heloderma kernel: [33222.777125] [] do_sendfile+0x18c/0x1f0
Jul 20 01:23:07 heloderma kernel: [33222.777127] [] sys_sendfile64+0x81/0xb0
Jul 20 01:23:07 heloderma kernel: [33222.777130] [] system_call_fastpath+0x16/0x1b
Jul 20 01:23:07 heloderma kernel: [33222.777133] INFO: task mysqld:3402 blocked for more than 120 seconds.
Jul 20 01:23:07 heloderma kernel: [33222.777134] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 20 01:23:07 heloderma kernel: [33222.777136] mysqld D ffff88043cba07c0 0 3402 2861 0 0x00000000
Jul 20 01:23:07 heloderma kernel: [33222.777138] ffff8804395d2f48 0000000000000082 0000000000000000 0000000000000001
Jul 20 01:23:07 heloderma kernel: [33222.777141] 0000000000000020 0000000000000082 ffff88019da94f00 ffff880408fe7d70
Jul 20 01:23:07 heloderma kernel: [33222.777143] 0000000000000001 ffff88043cba0d78 ffff8804395d3fd8 ffff8804395d3fd8
Jul 20 01:23:07 heloderma kernel: [33222.777146] Call Trace:
Jul 20 01:23:07 heloderma kernel: [33222.777154] [] cv_wait_common+0x9c/0x1a0 [spl]
Jul 20 01:23:07 heloderma kernel: [33222.777158] [] ? autoremove_wake_function+0x0/0x40
Jul 20 01:23:07 heloderma kernel: [33222.777166] [] __cv_wait+0x13/0x20 [spl]
Jul 20 01:23:07 heloderma kernel: [33222.777189] [] txg_wait_open+0x8b/0x110 [zfs]
Jul 20 01:23:07 heloderma kernel: [33222.777210] [] dmu_tx_wait+0xed/0xf0 [zfs]
Jul 20 01:23:07 heloderma kernel: [33222.777232] [] zfs_putpage+0x24b/0x260 [zfs]
Jul 20 01:23:07 heloderma kernel: [33222.777253] [] zpl_putpage+0x56/0x60 [zfs]
Jul 20 01:23:07 heloderma kernel: [33222.777273] [] zpl_writepage+0x12/0x20 [zfs]
Jul 20 01:23:07 heloderma kernel: [33222.777276] [] pageout+0x14d/0x390
Jul 20 01:23:07 heloderma kernel: [33222.777279] [] shrink_page_list+0x6c4/0x9e0
Jul 20 01:23:07 heloderma kernel: [33222.777281] [] shrink_inactive_list+0x3c7/0xb30
Jul 20 01:23:07 heloderma kernel: [33222.777290] [] ? spl_slab_reclaim+0x6e/0x3f0 [spl]
Jul 20 01:23:07 heloderma kernel: [33222.777298] [] ? spl_slab_reclaim+0x6e/0x3f0 [spl]
Jul 20 01:23:07 heloderma kernel: [33222.777303] [] shrink_zone+0x5d8/0x9d0
Jul 20 01:23:07 heloderma kernel: [33222.777306] [] ? ktime_get_ts+0xa9/0xe0
Jul 20 01:23:07 heloderma kernel: [33222.777308] [] do_try_to_free_pages+0x2a0/0x7f0
Jul 20 01:23:07 heloderma kernel: [33222.777311] [] try_to_free_pages+0xa0/0x130
Jul 20 01:23:07 heloderma kernel: [33222.777313] [] ? isolate_pages_global+0x0/0x520
Jul 20 01:23:07 heloderma kernel: [33222.777316] [] __alloc_pages_nodemask+0x5f0/0xb50
Jul 20 01:23:07 heloderma kernel: [33222.777319] [] alloc_pages_vma+0x9a/0x150
Jul 20 01:23:07 heloderma kernel: [33222.777322] [] handle_pte_fault+0xa87/0xf60
Jul 20 01:23:07 heloderma kernel: [33222.777325] [] ? __switch_to+0x26e/0x320
Jul 20 01:23:07 heloderma kernel: [33222.777327] [] handle_mm_fault+0x1e4/0x2b0
Jul 20 01:23:07 heloderma kernel: [33222.777330] [] __do_page_fault+0x139/0x490
Jul 20 01:23:07 heloderma kernel: [33222.777332] [] ? schedule_hrtimeout_range+0x145/0x170
Jul 20 01:23:07 heloderma kernel: [33222.777334] [] ? add_wait_queue+0x46/0x60
Jul 20 01:23:07 heloderma kernel: [33222.777337] [] ? remove_wait_queue+0x3c/0x50
Jul 20 01:23:07 heloderma kernel: [33222.777339] [] ? free_poll_entry+0x26/0x30
Jul 20 01:23:07 heloderma kernel: [33222.777341] [] ? poll_freewait+0x3d/0xa0
Jul 20 01:23:07 heloderma kernel: [33222.777344] [] do_page_fault+0x3e/0xa0
Jul 20 01:23:07 heloderma kernel: [33222.777347] [] page_fault+0x25/0x30
Jul 20 01:23:07 heloderma kernel: [33222.777349] [] ? copy_user_generic_string+0x2d/0x40
Jul 20 01:23:07 heloderma kernel: [33222.777352] [] ? ii_iovec_copy_to_user+0x73/0xf0
Jul 20 01:23:07 heloderma kernel: [33222.777355] [] file_read_iter_actor+0x45/0x80
Jul 20 01:23:07 heloderma kernel: [33222.777358] [] generic_file_read_iter+0x2a2/0x680
Jul 20 01:23:07 heloderma kernel: [33222.777371] [] ? ext4_file_open+0x5f/0x130 [ext4]
Jul 20 01:23:07 heloderma kernel: [33222.777374] [] generic_file_aio_read+0x8b/0xa0
Jul 20 01:23:07 heloderma kernel: [33222.777377] [] do_sync_read+0xfa/0x140
Jul 20 01:23:07 heloderma kernel: [33222.777380] [] ? mmap_region+0x347/0x770
Jul 20 01:23:07 heloderma kernel: [33222.777383] [] ? autoremove_wake_function+0x0/0x40
Jul 20 01:23:07 heloderma kernel: [33222.777386] [] ? do_mmap_pgoff+0x33a/0x380
Jul 20 01:23:07 heloderma kernel: [33222.777389] [] vfs_read+0xb5/0x1a0
Jul 20 01:23:07 heloderma kernel: [33222.777391] [] sys_read+0x51/0x90
Jul 20 01:23:07 heloderma kernel: [33222.777394] [] system_call_fastpath+0x16/0x1b
Jul 20 01:23:07 heloderma kernel: [33222.777406] INFO: task lsyncd:8886 blocked for more than 120 seconds.
Jul 20 01:23:07 heloderma kernel: [33222.777408] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 20 01:23:07 heloderma kernel: [33222.777409] lsyncd D ffff8803cf88f0d0 0 8886 8422 104 0x00000000
Jul 20 01:23:07 heloderma kernel: [33222.777412] ffff8803c7e03b38 0000000000000082 0000000000000000 0000000000000001
Jul 20 01:23:07 heloderma kernel: [33222.777415] 0000000000000020 0000000000000086 ffff88010ba08600 ffff880408fe7d70
Jul 20 01:23:07 heloderma kernel: [33222.777417] 0000000000000001 ffff8803cf88f688 ffff8803c7e03fd8 ffff8803c7e03fd8
Jul 20 01:23:07 heloderma kernel: [33222.777420] Call Trace:
Jul 20 01:23:07 heloderma kernel: [33222.777428] [] cv_wait_common+0x9c/0x1a0 [spl]
Jul 20 01:23:07 heloderma kernel: [33222.777431] [] ? autoremove_wake_function+0x0/0x40
Jul 20 01:23:07 heloderma kernel: [33222.777440] [] __cv_wait+0x13/0x20 [spl]
Jul 20 01:23:07 heloderma kernel: [33222.777462] [] txg_wait_open+0x8b/0x110 [zfs]
Jul 20 01:23:07 heloderma kernel: [33222.777483] [] dmu_tx_wait+0xed/0xf0 [zfs]
Jul 20 01:23:07 heloderma kernel: [33222.777505] [] zfs_write+0x3be/0xc90 [zfs]
Jul 20 01:23:07 heloderma kernel: [33222.777508] [] ? lru_cache_add_lru+0x21/0x40
Jul 20 01:23:07 heloderma kernel: [33222.777511] [] ? page_add_new_anon_rmap+0x9d/0xf0
Jul 20 01:23:07 heloderma kernel: [33222.777532] [] zpl_write_common+0x52/0x70 [zfs]
Jul 20 01:23:07 heloderma kernel: [33222.777553] [] zpl_write+0x68/0xa0 [zfs]
Jul 20 01:23:07 heloderma kernel: [33222.777556] [] ? thread_return+0x4e/0x7d0
Jul 20 01:23:07 heloderma kernel: [33222.777559] [] vfs_write+0xb8/0x1a0
Jul 20 01:23:07 heloderma kernel: [33222.777561] [] sys_write+0x51/0x90
Jul 20 01:23:07 heloderma kernel: [33222.777564] [] system_call_fastpath+0x16/0x1b
Jul 20 01:23:07 heloderma kernel: [33222.777567] INFO: task openvpn:9229 blocked for more than 120 seconds.
Jul 20 01:23:07 heloderma kernel: [33222.777568] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 20 01:23:07 heloderma kernel: [33222.777570] openvpn D ffff8803b7e004c0 0 9229 8422 104 0x00000000
Jul 20 01:23:07 heloderma kernel: [33222.777572] ffff8803987a3b38 0000000000000082 ffff880132cb7e40 ffff8803987a3b08
Jul 20 01:23:07 heloderma kernel: [33222.777575] 0000000000000020 ffff8802fb5fb670 ffff88007b8fe300 0000000000000000
Jul 20 01:23:07 heloderma kernel: [33222.777577] ffff880300000d7a ffff8803b7e00a78 ffff8803987a3fd8 ffff8803987a3fd8
Jul 20 01:23:07 heloderma kernel: [33222.777580] Call Trace:
Jul 20 01:23:07 heloderma kernel: [33222.777582] [] ? prepare_to_wait_exclusive+0x4e/0x80
Jul 20 01:23:07 heloderma kernel: [33222.777591] [] cv_wait_common+0x9c/0x1a0 [spl]
Jul 20 01:23:07 heloderma kernel: [33222.777612] [] ? dsl_dir_tempreserve_space+0xe2/0x1f0 [zfs]
Jul 20 01:23:07 heloderma kernel: [33222.777615] [] ? autoremove_wake_function+0x0/0x40
Jul 20 01:23:07 heloderma kernel: [33222.777624] [] __cv_wait+0x13/0x20 [spl]
Jul 20 01:23:07 heloderma kernel: [33222.777646] [] txg_wait_open+0x8b/0x110 [zfs]
Jul 20 01:23:07 heloderma kernel: [33222.777667] [] dmu_tx_wait+0xed/0xf0 [zfs]
Jul 20 01:23:07 heloderma kernel: [33222.777688] [] zfs_write+0x3be/0xc90 [zfs]
Jul 20 01:23:07 heloderma kernel: [33222.777692] [] ? sock_recvmsg+0x121/0x150
Jul 20 01:23:07 heloderma kernel: [33222.777694] [] ? sock_sendmsg+0x10d/0x140
Jul 20 01:23:07 heloderma kernel: [33222.777697] [] ? __kmalloc+0x22f/0x270
Jul 20 01:23:07 heloderma kernel: [33222.777706] [] ? kmem_free_debug+0x4b/0x150 [spl]
Jul 20 01:23:07 heloderma kernel: [33222.777728] [] zpl_write_common+0x52/0x70 [zfs]
Jul 20 01:23:07 heloderma kernel: [33222.777749] [] zpl_write+0x68/0xa0 [zfs]
Jul 20 01:23:07 heloderma kernel: [33222.777752] [] vfs_write+0xb8/0x1a0
Jul 20 01:23:07 heloderma kernel: [33222.777754] [] sys_write+0x51/0x90
Jul 20 01:23:07 heloderma kernel: [33222.777757] [] system_call_fastpath+0x16/0x1b
Jul 20 01:23:07 heloderma kernel: [33222.777759] INFO: task openvpn:9236 blocked for more than 120 seconds.
Jul 20 01:23:07 heloderma kernel: [33222.777761] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 20 01:23:07 heloderma kernel: [33222.777762] openvpn D ffff8803b7e00f90 0 9236 8422 104 0x00000000
Jul 20 01:23:07 heloderma kernel: [33222.777765] ffff880394d0bb38 0000000000000086 0000000000000000 ffff880394d0bb08
Jul 20 01:23:07 heloderma kernel: [33222.777768] 0000000000000020 ffff8802fb5fb670 ffff88008fd96500 0000000000000000
Jul 20 01:23:07 heloderma kernel: [33222.777770] ffff880300000c42 ffff8803b7e01548 ffff880394d0bfd8 ffff880394d0bfd8
Jul 20 01:23:07 heloderma kernel: [33222.777773] Call Trace:
Jul 20 01:23:07 heloderma kernel: [33222.777781] [] cv_wait_common+0x9c/0x1a0 [spl]
Jul 20 01:23:07 heloderma kernel: [33222.777802] [] ? dsl_dir_tempreserve_space+0xe2/0x1f0 [zfs]
Jul 20 01:23:07 heloderma kernel: [33222.777806] [] ? autoremove_wake_function+0x0/0x40
Jul 20 01:23:07 heloderma kernel: [33222.777815] [] __cv_wait+0x13/0x20 [spl]
Jul 20 01:23:07 heloderma kernel: [33222.777837] [] txg_wait_open+0x8b/0x110 [zfs]
Jul 20 01:23:07 heloderma kernel: [33222.777858] [] dmu_tx_wait+0xed/0xf0 [zfs]
Jul 20 01:23:07 heloderma kernel: [33222.777882] [] zfs_write+0x3be/0xc90 [zfs]
Jul 20 01:23:07 heloderma kernel: [33222.777885] [] ? sock_sendmsg+0x10d/0x140
Jul 20 01:23:07 heloderma kernel: [33222.777888] [] ? __kmalloc+0x22f/0x270
Jul 20 01:23:07 heloderma kernel: [33222.777897] [] ? kmem_free_debug+0x4b/0x150 [spl]
Jul 20 01:23:07 heloderma kernel: [33222.777919] [] zpl_write_common+0x52/0x70 [zfs]
Jul 20 01:23:07 heloderma kernel: [33222.777939] [] zpl_write+0x68/0xa0 [zfs]
Jul 20 01:23:07 heloderma kernel: [33222.777941] [] ? ktime_get_ts+0xa9/0xe0
Jul 20 01:23:07 heloderma kernel: [33222.777944] [] vfs_write+0xb8/0x1a0
Jul 20 01:23:07 heloderma kernel: [33222.777946] [] sys_write+0x51/0x90
Jul 20 01:23:07 heloderma kernel: [33222.777949] [] ? sys_poll+0x7c/0x110
Jul 20 01:23:07 heloderma kernel: [33222.777952] [] system_call_fastpath+0x16/0x1b
Jul 20 01:23:07 heloderma kernel: [33222.777955] INFO: task qmgr:9396 blocked for more than 120 seconds.
Jul 20 01:23:07 heloderma kernel: [33222.777956] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 20 01:23:07 heloderma kernel: [33222.777957] qmgr D ffff88038a13c080 0 9396 9371 104 0x00000000
Jul 20 01:23:07 heloderma kernel: [33222.777960] ffff88038a1bbab8 0000000000000086 0000000000000000 ffff88038a1bba88
Jul 20 01:23:07 heloderma kernel: [33222.777963] 0000000000000020 ffff880394f39830 ffff88010ed6cec0 0000000000000000
Jul 20 01:23:07 heloderma kernel: [33222.777965] ffff880300000c60 ffff88038a13c638 ffff88038a1bbfd8 ffff88038a1bbfd8
Jul 20 01:23:07 heloderma kernel: [33222.777968] Call Trace:
Jul 20 01:23:07 heloderma kernel: [33222.777976] [] cv_wait_common+0x9c/0x1a0 [spl]
Jul 20 01:23:07 heloderma kernel: [33222.777998] [] ? dsl_dir_tempreserve_space+0xe2/0x1f0 [zfs]
Jul 20 01:23:07 heloderma kernel: [33222.778002] [] ? autoremove_wake_function+0x0/0x40
Jul 20 01:23:07 heloderma kernel: [33222.778011] [] __cv_wait+0x13/0x20 [spl]
Jul 20 01:23:07 heloderma kernel: [33222.778034] [] txg_wait_open+0x8b/0x110 [zfs]
Jul 20 01:23:07 heloderma kernel: [33222.778054] [] dmu_tx_wait+0xed/0xf0 [zfs]
Jul 20 01:23:07 heloderma kernel: [33222.778076] [] zfs_rename+0x43c/0xda0 [zfs]
Jul 20 01:23:07 heloderma kernel: [33222.778097] [] zpl_rename+0x5e/0x90 [zfs]
Jul 20 01:23:07 heloderma kernel: [33222.778100] [] vfs_rename+0x408/0x450
Jul 20 01:23:07 heloderma kernel: [33222.778102] [] ? __lookup_hash+0x102/0x160
Jul 20 01:23:07 heloderma kernel: [33222.778105] [] sys_renameat+0x22e/0x260
Jul 20 01:23:07 heloderma kernel: [33222.778126] [] ? zfs_getattr_fast+0xe0/0x160 [zfs]
Jul 20 01:23:07 heloderma kernel: [33222.778130] [] ? cp_new_stat+0xe4/0x100
Jul 20 01:23:07 heloderma kernel: [33222.778132] [] ? sys_newlstat+0x36/0x50
Jul 20 01:23:07 heloderma kernel: [33222.778135] [] sys_rename+0x1b/0x20
Jul 20 01:23:07 heloderma kernel: [33222.778138] [] system_call_fastpath+0x16/0x1b

@ryao
Copy link
Contributor

ryao commented Jul 21, 2012

Is this a Funtoo Linux system?

I ask because Funtoo Linux uses the same ebuilds as Gentoo Linux and these backtraces look like a bug that I introduced into Gentoo's ebuilds by mistake. It was fixed within a week, but some systems were likely affected. Your backtraces suggest to me that yours is one of them. If you are using Funtoo Linux, resyncing and building the latest spl and zfs ebuilds should resolve this problem.

@alexclear
Copy link
Author

No, this is Debian 6.0.3 with custom kernel (well, basically the RHEL6/CentOS6 kernel with OpenVZ support repackaged for Debian). I built ZFS/SPL deb packages using -rc9 tarballs from the site. Should I try the latest code from git master?

@ryao
Copy link
Contributor

ryao commented Jul 21, 2012

There have been no commits that appear to solve this issue. With that said, this could be caused by PF_MEMALLOC. I have been working on eliminating the usage of PF_MEMALLOC from the code. You could try a GIT checkout my from my GIT's gentoo branch and see if that helps:

https://github.com/ryao/spl
https://github.com/ryao/zfs

There is also a kernel patch that is needed to fully eliminate PF_MEMALLOC usage, although it probably won't apply cleanly to your kernel. It shouldn't require too much effort to rectify that though:

https://bugs.gentoo.org/show_bug.cgi?id=416685

The version that I marked as obsolete probably will apply with the fewest failures. I have not had time to look into making a version that applies against OpenVZ patched RHEL kernels, but LLNL is applying this patch to their RHEL6 systems, so @behlendorf might have a version that would apply more cleanly to your kernel.

@phillipp
Copy link

We have the same problem in issue #837 but no end in sight...

@alexclear
Copy link
Author

@ryao I've adapted the patch to the RHEL6 kernel and compiled spl and zfs from your gentoo tree. I got something different today:

top - 08:52:08 up  7:14,  3 users,  load average: 76.49, 71.02, 59.15
Tasks: 558 total,   1 running, 557 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.0%us, 54.0%sy,  0.0%ni, 23.7%id, 22.3%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  16339760k total, 16149800k used,   189960k free,    10088k buffers
Swap: 33553328k total,        0k used, 33553328k free,   292652k cached

    PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                                                           
   1190 root       0 -20     0    0    0 S   12  0.0   5:53.70 txg_sync                                                                                                                          
   1189 root       0 -20     0    0    0 D    9  0.0   3:59.18 txg_quiesce                                                                                                                       
 201227 root      20   0 23568  532  272 D    6  0.0   0:46.17 cron                                                                                                                              
 201228 root      20   0 20900  492  256 D    5  0.0   0:41.28 cron                                                                                                                              
 201372 root      20   0 20900  488  256 D    5  0.0   0:37.81 cron                                                                                                                              
   5501 Debian-e  20   0 39388  712  188 D    5  0.0   0:59.53 qmgr                                                                                                                              
   6608 root      20   0   136   40   20 D    4  0.0   0:44.81 svlogd                                                                                                                            
 157231 root      20   0  6112  392  244 D    4  0.0   1:18.06 syslogd                                                                                                                           
   5910 sshd      20   0  474m 275m 1180 S    4  1.7   1:03.78 mysqld                                                                                                                            
   6155 root      20   0  148m 5180  104 D    4  0.0   0:36.37 apache2                                                                                                                           

basically most processes were stuck in the D-state

@ryao
Copy link
Contributor

ryao commented Jul 23, 2012

Is there any chance that you could post the zfs.ko kernel module somewhere online with an accompanying panic message? That would let me disassemble it to get a better idea of where the NULL pointer dereference occurs.

@alexclear
Copy link
Author

I just got the same D-state situation again, here is a list of processes that were in the D-state:

    PID TTY      STAT   TIME COMMAND
     49 ?        D      0:05  \_ [bdi-default]
    100 ?        D      0:18  \_ [kswapd0]
    426 ?        D      0:19  \_ [md2_raid1]
    482 ?        D      0:00  \_ [flush-9:2]
   1055 ?        D<     0:23  \_ [txg_sync]
 207810 ?        D      0:00  |                   \_ /usr/share/munin/munin-update [Munin::Master::UpdateWorker<boombate.com;lemur.boombate.com>]
 207812 ?        D      0:00  |                   \_ /usr/share/munin/munin-update [Munin::Master::UpdateWorker<boombate.com;manul.boombate.com>]
 207813 ?        D      0:00  |                   \_ /usr/share/munin/munin-update [Munin::Master::UpdateWorker<boombate.com;koala.boombate.com>]
 207814 ?        D      0:00  |                   \_ /usr/share/munin/munin-update [Munin::Master::UpdateWorker<boombate.com;monstera.boombate.com>]
 207815 ?        D      0:00  |                   \_ /usr/share/munin/munin-update [Munin::Master::UpdateWorker<boombate.com;chameleon.boombate.com>]
 207816 ?        D      0:00  |                   \_ /usr/share/munin/munin-update [Munin::Master::UpdateWorker<boombate.com;panda.boombate.com>]
 207818 ?        D      0:00  |                   \_ /usr/share/munin/munin-update [Munin::Master::UpdateWorker<serverclub.com;d2595.serverclub.com>]
 199621 ?        D      0:40  |   \_ /usr/sbin/apache2 -d /etc/apache2-boombate -k start
 199622 ?        D      0:47  |   \_ /usr/sbin/apache2 -d /etc/apache2-boombate -k start
 202886 ?        D      0:33  |   \_ /usr/sbin/apache2 -d /etc/apache2-boombate -k start
 202912 ?        D      0:39  |   \_ /usr/sbin/apache2 -d /etc/apache2-boombate -k start
 202913 ?        D      0:32  |   \_ /usr/sbin/apache2 -d /etc/apache2-boombate -k start
 202927 ?        D      0:30  |   \_ /usr/sbin/apache2 -d /etc/apache2-boombate -k start
 202928 ?        D      0:34  |   \_ /usr/sbin/apache2 -d /etc/apache2-boombate -k start
 206371 ?        D      0:19  |   \_ /usr/sbin/apache2 -d /etc/apache2-boombate -k start
 206376 ?        D      0:17  |   \_ /usr/sbin/apache2 -d /etc/apache2-boombate -k start
 206377 ?        D      0:15  |   \_ /usr/sbin/apache2 -d /etc/apache2-boombate -k start
 208820 ?        D      0:00  |   \_ /usr/sbin/apache2 -d /etc/apache2-boombate -k start
 208852 ?        D      0:00  |   \_ /usr/sbin/apache2 -d /etc/apache2-boombate -k start
 208853 ?        D      0:00  |   \_ /usr/sbin/apache2 -d /etc/apache2-boombate -k start
 208959 ?        D      0:00  |   \_ /usr/sbin/apache2 -d /etc/apache2-boombate -k start
 208960 ?        D      0:00  |   \_ /usr/sbin/apache2 -d /etc/apache2-boombate -k start
 208961 ?        D      0:00  |   \_ /usr/sbin/apache2 -d /etc/apache2-boombate -k start
 208779 ?        Ds     0:00  |       \_ /usr/bin/perl -w /etc/munin/plugins/postfix_mailvolume
 207821 ?        D      0:00  \_ /usr/sbin/munin-node [10.222.0.3]
 208962 ?        Ds     0:00      \_ /usr/bin/perl -w /etc/munin/plugins/postfix_mailvolume

I also got some stacktraces this time:

2012-07-23_10:17:09.85486 [19310.593418] INFO: task apache2:199622 blocked for more than 120 seconds.
2012-07-23_10:17:09.85597 [19310.594505] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
2012-07-23_10:17:09.85599 [19310.595563] apache2 D ffff880081c54600 0 199622 8729 104 0x00000000
2012-07-23_10:17:09.85696 [19310.596590] ffff880068505b58 0000000000000086 ffff880068505b48 01ff88006028d900
2012-07-23_10:17:09.85800 [19310.597621] ffff8802a94efcf0 ffff8802a94efcdc 0000000000000000 00000001000006c0
2012-07-23_10:17:09.85903 [19310.598638] ffff88007671ca08 ffff880081c54bb8 ffff880068505fd8 ffff880068505fd8
2012-07-23_10:17:09.86108 [19310.599744] Call Trace:
2012-07-23_10:17:09.86215 [19310.600769] [] ? prepare_to_wait_exclusive+0x4e/0x80
2012-07-23_10:17:09.86323 [19310.601839] [] cv_wait_common+0x9c/0x1a0 [spl]
2012-07-23_10:17:09.86433 [19310.602892] [] ? autoremove_wake_function+0x0/0x40
2012-07-23_10:17:09.86533 [19310.603991] [] ? avl_find+0x60/0xb0 [zavl]
2012-07-23_10:17:09.86644 [19310.605000] [] __cv_wait+0x13/0x20 [spl]
2012-07-23_10:17:09.86751 [19310.606084] [] zfs_range_lock+0x2ac/0x5c0 [zfs]
2012-07-23_10:17:09.86840 [19310.607070] [] ? mutex_lock+0x1e/0x50
2012-07-23_10:17:09.86950 [19310.608087] [] zfs_write+0x252/0xc90 [zfs]
2012-07-23_10:17:09.87046 [19310.609134] [] ? sock_aio_write+0x0/0x150
2012-07-23_10:17:09.87145 [19310.610119] [] ? __wake_up+0x53/0x70
2012-07-23_10:17:09.87239 [19310.611112] [] zpl_write_common+0x52/0x70 [zfs]
2012-07-23_10:17:09.87340 [19310.612089] [] zpl_write+0x68/0xa0 [zfs]
2012-07-23_10:17:09.87435 [19310.613048] [] ? ktime_get_ts+0xa9/0xe0
2012-07-23_10:17:09.87526 [19310.614005] [] vfs_write+0xb8/0x1a0
2012-07-23_10:17:09.87618 [19310.614936] [] sys_write+0x51/0x90
2012-07-23_10:17:09.87707 [19310.615845] [] ? sys_poll+0x7c/0x110
2012-07-23_10:17:09.87798 [19310.616727] [] system_call_fastpath+0x16/0x1b
2012-07-23_10:17:09.87894 [19310.617633] INFO: task apache2:202886 blocked for more than 120 seconds.
2012-07-23_10:17:09.87980 [19310.618531] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
2012-07-23_10:17:09.87982 [19310.619440] apache2 D ffff88039cd10b50 0 202886 8729 104 0x00000000
2012-07-23_10:17:09.88074 [19310.620364] ffff8800545d9b58 0000000000000082 ffff8800545d9b48 01ff88027a9f8000
2012-07-23_10:17:09.88171 [19310.621259] ffff88042de4fcf0 ffff88042de4fcdc 0000000000000000 00000001000006c0
2012-07-23_10:17:09.88255 [19310.622170] ffff88020ab46b88 ffff88039cd11108 ffff8800545d9fd8 ffff8800545d9fd8
2012-07-23_10:17:09.88460 [19310.623140] Call Trace:
2012-07-23_10:17:09.88536 [19310.624075] [] ? prepare_to_wait_exclusive+0x4e/0x80
2012-07-23_10:17:09.88629 [19310.625022] [] cv_wait_common+0x9c/0x1a0 [spl]
2012-07-23_10:17:09.88730 [19310.625944] [] ? autoremove_wake_function+0x0/0x40
2012-07-23_10:17:09.88819 [19310.626941] [] ? avl_find+0x60/0xb0 [zavl]
2012-07-23_10:17:09.88912 [19310.627852] [] __cv_wait+0x13/0x20 [spl]
2012-07-23_10:17:09.89003 [19310.628804] [] zfs_range_lock+0x2ac/0x5c0 [zfs]
2012-07-23_10:17:09.89099 [19310.629683] [] ? mutex_lock+0x1e/0x50
2012-07-23_10:17:09.89195 [19310.630652] [] zfs_write+0x252/0xc90 [zfs]
2012-07-23_10:17:09.89285 [19310.631600] [] ? sock_aio_write+0x0/0x150
2012-07-23_10:17:09.89370 [19310.632506] [] ? __wake_up+0x53/0x70
2012-07-23_10:17:09.89463 [19310.633391] [] zpl_write_common+0x52/0x70 [zfs]
2012-07-23_10:17:09.89552 [19310.634293] [] zpl_write+0x68/0xa0 [zfs]
2012-07-23_10:17:09.89650 [19310.635167] [] ? ktime_get_ts+0xa9/0xe0
2012-07-23_10:17:09.89747 [19310.636124] [] vfs_write+0xb8/0x1a0
2012-07-23_10:17:09.89828 [19310.637048] [] sys_write+0x51/0x90
2012-07-23_10:17:09.89912 [19310.637927] [] ? sys_poll+0x7c/0x110
2012-07-23_10:17:09.89997 [19310.638758] [] system_call_fastpath+0x16/0x1b
2012-07-23_10:17:09.90078 [19310.639612] INFO: task apache2:202913 blocked for more than 120 seconds.
2012-07-23_10:17:09.90166 [19310.640443] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
2012-07-23_10:17:09.90167 [19310.641280] apache2 D ffff880396d72280 0 202913 8729 104 0x00000000
2012-07-23_10:17:09.90252 [19310.642150] ffff88007555bb58 0000000000000082 0000000000000000 01ff8802de31f240
2012-07-23_10:17:09.90336 [19310.642964] ffff8803302948f0 ffff8803302948dc 0000000000000000 00000001000006c0
2012-07-23_10:17:09.90422 [19310.643830] ffff8800818bd088 ffff880396d72838 ffff88007555bfd8 ffff88007555bfd8
2012-07-23_10:17:09.90587 [19310.644663] Call Trace:
2012-07-23_10:17:09.90667 [19310.645509] [] cv_wait_common+0x9c/0x1a0 [spl]
2012-07-23_10:17:09.90746 [19310.646303] [] ? loopback_xmit+0xa1/0xe0
2012-07-23_10:17:09.90841 [19310.647107] [] ? autoremove_wake_function+0x0/0x40
2012-07-23_10:17:09.90930 [19310.648065] [] ? avl_find+0x60/0xb0 [zavl]
2012-07-23_10:17:09.91017 [19310.648942] [] __cv_wait+0x13/0x20 [spl]
2012-07-23_10:17:09.91102 [19310.649801] [] zfs_range_lock+0x2ac/0x5c0 [zfs]
2012-07-23_10:17:09.91186 [19310.650663] [] ? mutex_lock+0x1e/0x50
2012-07-23_10:17:09.91268 [19310.651508] [] zfs_write+0x252/0xc90 [zfs]
2012-07-23_10:17:09.91353 [19310.652322] [] ? sock_aio_write+0x0/0x150
2012-07-23_10:17:09.91439 [19310.653170] [] ? __wake_up+0x53/0x70
2012-07-23_10:17:09.91535 [19310.654060] [] zpl_write_common+0x52/0x70 [zfs]
2012-07-23_10:17:09.91608 [19310.654914] [] zpl_write+0x68/0xa0 [zfs]
2012-07-23_10:17:09.91696 [19310.655737] [] vfs_write+0xb8/0x1a0
2012-07-23_10:17:09.91778 [19310.656589] [] sys_write+0x51/0x90
2012-07-23_10:17:09.91860 [19310.657421] [] system_call_fastpath+0x16/0x1b
2012-07-23_10:17:09.91938 [19310.658220] INFO: task apache2:202927 blocked for more than 120 seconds.
2012-07-23_10:17:09.92027 [19310.659008] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
2012-07-23_10:17:09.92029 [19310.659882] apache2 D ffff8803683db2d0 0 202927 8729 104 0x00000000
2012-07-23_10:17:09.92114 [19310.660760] ffff880066e7fb58 0000000000000082 ffff880066e7fb48 01ff88017f18fd80
2012-07-23_10:17:09.92204 [19310.661651] ffff8800a8e814f0 ffff8800a8e814dc 0000000000000000 00000001000006c0
2012-07-23_10:17:09.92296 [19310.662496] ffff88043038aac8 ffff8803683db888 ffff880066e7ffd8 ffff880066e7ffd8
2012-07-23_10:17:09.92452 [19310.663370] Call Trace:
2012-07-23_10:17:09.92535 [19310.664154] [] ? prepare_to_wait_exclusive+0x4e/0x80
2012-07-23_10:17:09.92622 [19310.664957] [] cv_wait_common+0x9c/0x1a0 [spl]
2012-07-23_10:17:09.92703 [19310.665838] [] ? loopback_xmit+0xa1/0xe0
2012-07-23_10:17:09.92785 [19310.666669] [] ? autoremove_wake_function+0x0/0x40
2012-07-23_10:17:09.92865 [19310.667473] [] ? avl_find+0x60/0xb0 [zavl]
2012-07-23_10:17:09.92947 [19310.668283] [] __cv_wait+0x13/0x20 [spl]
2012-07-23_10:17:09.93029 [19310.669127] [] zfs_range_lock+0x2ac/0x5c0 [zfs]
2012-07-23_10:17:09.93113 [19310.669918] [] ? mutex_lock+0x1e/0x50
2012-07-23_10:17:09.93199 [19310.670783] [] zfs_write+0x252/0xc90 [zfs]
2012-07-23_10:17:09.93283 [19310.671616] [] ? sock_aio_write+0x0/0x150
2012-07-23_10:17:09.93361 [19310.672466] [] ? __wake_up+0x53/0x70
2012-07-23_10:17:09.93447 [19310.673255] [] zpl_write_common+0x52/0x70 [zfs]
2012-07-23_10:17:09.93531 [19310.674116] [] zpl_write+0x68/0xa0 [zfs]
2012-07-23_10:17:09.93613 [19310.674940] [] vfs_write+0xb8/0x1a0
2012-07-23_10:17:09.93693 [19310.675758] [] sys_write+0x51/0x90
2012-07-23_10:17:09.93772 [19310.676559] [] system_call_fastpath+0x16/0x1b
2012-07-23_10:17:09.93855 [19310.677360] INFO: task apache2:206371 blocked for more than 120 seconds.
2012-07-23_10:17:09.93940 [19310.678170] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
2012-07-23_10:17:09.93941 [19310.678995] apache2 D ffff8800798e4c90 0 206371 8729 104 0x00000000
2012-07-23_10:17:09.94022 [19310.679827] ffff880082245b58 0000000000000086 0000000000000000 01ff880402673900
2012-07-23_10:17:09.94107 [19310.680667] ffff8802b70088f0 ffff8802b70088dc 0000000000000000 00000001000006c0
2012-07-23_10:17:09.94193 [19310.681525] ffff8800971b61c8 ffff8800798e5248 ffff880082245fd8 ffff880082245fd8
2012-07-23_10:17:09.94356 [19310.682367] Call Trace:
2012-07-23_10:17:09.94437 [19310.683176] [] cv_wait_common+0x9c/0x1a0 [spl]
2012-07-23_10:17:09.94517 [19310.683983] [] ? loopback_xmit+0xa1/0xe0
2012-07-23_10:17:09.94599 [19310.684816] [] ? autoremove_wake_function+0x0/0x40
2012-07-23_10:17:09.94681 [19310.685618] [] ? avl_find+0x60/0xb0 [zavl]
2012-07-23_10:17:09.94759 [19310.686443] [] __cv_wait+0x13/0x20 [spl]
2012-07-23_10:17:09.94840 [19310.687234] [] zfs_range_lock+0x2ac/0x5c0 [zfs]
2012-07-23_10:17:09.94925 [19310.688028] [] ? mutex_lock+0x1e/0x50
2012-07-23_10:17:09.95009 [19310.688905] [] zfs_write+0x252/0xc90 [zfs]
2012-07-23_10:17:09.95091 [19310.689707] [] ? sock_aio_write+0x0/0x150
2012-07-23_10:17:09.95170 [19310.690528] [] ? __wake_up+0x53/0x70
2012-07-23_10:17:09.95250 [19310.691343] [] zpl_write_common+0x52/0x70 [zfs]
2012-07-23_10:17:09.95336 [19310.692119] [] zpl_write+0x68/0xa0 [zfs]
2012-07-23_10:17:09.95421 [19310.692994] [] vfs_write+0xb8/0x1a0
2012-07-23_10:17:09.95505 [19310.693845] [] sys_write+0x51/0x90
2012-07-23_10:17:09.95584 [19310.694641] [] system_call_fastpath+0x16/0x1b
2012-07-23_10:17:09.95667 [19310.695462] INFO: task apache2:206376 blocked for more than 120 seconds.
2012-07-23_10:17:09.95752 [19310.696282] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
2012-07-23_10:17:09.95753 [19310.697107] apache2 D ffff8803e1bb8cd0 0 206376 8729 104 0x00000000
2012-07-23_10:17:09.95836 [19310.697941] ffff88005da17b58 0000000000000086 0000000000000000 01ff88021b9ca240
2012-07-23_10:17:09.95918 [19310.698770] ffff880201cf18f0 ffff880201cf18dc 0000000000000000 00000001000006c0
2012-07-23_10:17:09.96001 [19310.699602] ffff8800796e7908 ffff8803e1bb9288 ffff88005da17fd8 ffff88005da17fd8
2012-07-23_10:17:09.96158 [19310.700418] Call Trace:
2012-07-23_10:17:09.96238 [19310.701204] [] cv_wait_common+0x9c/0x1a0 [spl]
2012-07-23_10:17:09.96320 [19310.702002] [] ? ipv4_rcv_saddr_equal+0x0/0x60
2012-07-23_10:17:09.96399 [19310.702808] [] ? autoremove_wake_function+0x0/0x40
2012-07-23_10:17:09.96480 [19310.703605] [] ? avl_find+0x60/0xb0 [zavl]
2012-07-23_10:17:09.96559 [19310.704421] [] __cv_wait+0x13/0x20 [spl]
2012-07-23_10:17:09.96639 [19310.705219] [] zfs_range_lock+0x2ac/0x5c0 [zfs]
2012-07-23_10:17:09.96725 [19310.706011] [] ? mutex_lock+0x1e/0x50
2012-07-23_10:17:09.96809 [19310.706884] [] zfs_write+0x252/0xc90 [zfs]
2012-07-23_10:17:09.96894 [19310.707704] [] ? sock_aio_write+0x0/0x150
2012-07-23_10:17:09.97006 [19310.708551] [] ? __wake_up+0x53/0x70
2012-07-23_10:17:09.97064 [19310.709435] [] zpl_write_common+0x52/0x70 [zfs]
2012-07-23_10:17:09.97143 [19310.710234] [] zpl_write+0x68/0xa0 [zfs]
2012-07-23_10:17:09.97224 [19310.711039] [] vfs_write+0xb8/0x1a0
2012-07-23_10:17:09.97301 [19310.711848] [] sys_write+0x51/0x90
2012-07-23_10:17:09.97386 [19310.712639] [] system_call_fastpath+0x16/0x1b
2012-07-23_10:17:09.97464 [19310.713461] INFO: task apache2:206377 blocked for more than 120 seconds.
2012-07-23_10:17:09.97553 [19310.714244] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
2012-07-23_10:17:09.97554 [19310.715102] apache2 D ffff8803a9d36b90 0 206377 8729 104 0x00000000
2012-07-23_10:17:09.97641 [19310.715953] ffff88007c811b58 0000000000000086 ffff8801601537c8 01ff88019feccd80
2012-07-23_10:17:09.97728 [19310.716839] ffff88033e379cf0 ffff88033e379cdc 0000000000000000 00000001000006c0
2012-07-23_10:17:09.97816 [19310.717714] ffff88007671c348 ffff8803a9d37148 ffff88007c811fd8 ffff88007c811fd8
2012-07-23_10:17:09.97987 [19310.718638] Call Trace:
2012-07-23_10:17:09.98072 [19310.719477] [] ? prepare_to_wait_exclusive+0x4e/0x80
2012-07-23_10:17:09.98153 [19310.720281] [] cv_wait_common+0x9c/0x1a0 [spl]
2012-07-23_10:17:09.98238 [19310.721150] [] ? loopback_xmit+0xa1/0xe0
2012-07-23_10:17:09.98318 [19310.721998] [] ? autoremove_wake_function+0x0/0x40
2012-07-23_10:17:09.98402 [19310.722802] [] ? avl_find+0x60/0xb0 [zavl]
2012-07-23_10:17:09.98488 [19310.723632] [] __cv_wait+0x13/0x20 [spl]
2012-07-23_10:17:09.98568 [19310.724505] [] zfs_range_lock+0x2ac/0x5c0 [zfs]
2012-07-23_10:17:09.98648 [19310.725281] [] ? mutex_lock+0x1e/0x50
2012-07-23_10:17:09.98734 [19310.726112] [] zfs_write+0x252/0xc90 [zfs]
2012-07-23_10:17:09.98818 [19310.726931] [] ? sock_aio_write+0x0/0x150
2012-07-23_10:17:09.98898 [19310.727773] [] ? __wake_up+0x53/0x70
2012-07-23_10:17:09.98987 [19310.728614] [] zpl_write_common+0x52/0x70 [zfs]
2012-07-23_10:17:09.99063 [19310.729431] [] zpl_write+0x68/0xa0 [zfs]
2012-07-23_10:17:09.99142 [19310.730227] [] vfs_write+0xb8/0x1a0
2012-07-23_10:17:09.99230 [19310.731024] [] sys_write+0x51/0x90
2012-07-23_10:17:09.99310 [19310.731890] [] system_call_fastpath+0x16/0x1b
2012-07-23_10:17:09.99396 [19310.732710] INFO: task munin-update:207812 blocked for more than 120 seconds.
2012-07-23_10:17:09.99481 [19310.733559] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
2012-07-23_10:17:09.99483 [19310.734386] munin-update D ffff8803a57f5050 0 207812 207809 103 0x00000000
2012-07-23_10:17:09.99567 [19310.735211] ffff880072aabe08 0000000000000082 ffff8803a57f5050 ffff8803a57f5050
2012-07-23_10:17:09.99649 [19310.736036] ffff880072aabec8 ffff8803a57f5050 0000000100000010 ffff8803a57f5050
2012-07-23_10:17:09.99728 [19310.736852] ffff880243fc3890 ffff8803a57f5608 ffff880072aabfd8 ffff880072aabfd8
2012-07-23_10:17:09.99891 [19310.737699] Call Trace:
2012-07-23_10:17:09.99972 [19310.738485] [] __mutex_lock_slowpath+0x13e/0x180
2012-07-23_10:17:10.00050 [19310.739295] [] mutex_lock+0x2b/0x50
2012-07-23_10:17:10.00128 [19310.740093] [] do_unlinkat+0x96/0x1c0
2012-07-23_10:17:10.00207 [19310.740877] [] ? do_page_fault+0x3e/0xa0
2012-07-23_10:17:10.00286 [19310.741663] [] sys_unlink+0x16/0x20
2012-07-23_10:17:10.00367 [19310.742454] [] system_call_fastpath+0x16/0x1b
2012-07-23_10:17:10.00447 [19310.743272] INFO: task munin-update:207813 blocked for more than 120 seconds.
2012-07-23_10:17:10.00529 [19310.744059] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
2012-07-23_10:17:10.00530 [19310.744846] munin-update D ffff88004ae88800 0 207813 207809 103 0x00000000
2012-07-23_10:17:10.00608 [19310.745659] ffff880078cbfb58 0000000000000086 0000000000000003 ffff880000040b00
2012-07-23_10:17:10.00749 [19310.745664] ffff88004ae88800 00000000000200da ffff880078cbfbd8 ffffffff811352dd
2012-07-23_10:17:10.00807 [19310.745668] ffff88000002fd80 ffff88004ae88db8 ffff880078cbffd8 ffff880078cbffd8
2012-07-23_10:17:10.00847 [19310.745672] Call Trace:
2012-07-23_10:17:10.00913 [19310.745676] [] ? __alloc_pages_nodemask+0x14d/0xb50
2012-07-23_10:17:10.00971 [19310.745681] [] ? prepare_to_wait_exclusive+0x4e/0x80
2012-07-23_10:17:10.01032 [19310.745695] [] cv_wait_common+0x9c/0x1a0 [spl]
2012-07-23_10:17:10.01095 [19310.745699] [] ? autoremove_wake_function+0x0/0x40
2012-07-23_10:17:10.01161 [19310.745705] [] ? avl_find+0x60/0xb0 [zavl]
2012-07-23_10:17:10.01225 [19310.745714] [] __cv_wait+0x13/0x20 [spl]
2012-07-23_10:17:10.01286 [19310.745743] [] zfs_range_lock+0x2ac/0x5c0 [zfs]
2012-07-23_10:17:10.01346 [19310.745746] [] ? page_add_new_anon_rmap+0x9d/0xf0
2012-07-23_10:17:10.01407 [19310.745748] [] ? mutex_lock+0x1e/0x50
2012-07-23_10:17:10.01462 [19310.745765] [] zfs_write+0x252/0xc90 [zfs]
2012-07-23_10:17:10.01531 [19310.745767] [] ? handle_pte_fault+0x2ce/0xf50
2012-07-23_10:17:10.01584 [19310.745769] [] ? mntput_no_expire+0x30/0x110
2012-07-23_10:17:10.01636 [19310.745770] [] ? mntput_no_expire+0x30/0x110
2012-07-23_10:17:10.01693 [19310.745773] [] ? __do_page_fault+0x1e4/0x490
2012-07-23_10:17:10.01746 [19310.745788] [] zpl_write_common+0x52/0x70 [zfs]
2012-07-23_10:17:10.01803 [19310.745803] [] zpl_write+0x68/0xa0 [zfs]
2012-07-23_10:17:10.01857 [19310.745805] [] vfs_write+0xb8/0x1a0
2012-07-23_10:17:10.01914 [19310.745807] [] sys_write+0x51/0x90
2012-07-23_10:17:10.01969 [19310.745809] [] system_call_fastpath+0x16/0x1b
2012-07-23_10:17:10.02029 [19310.745811] INFO: task munin-update:207814 blocked for more than 120 seconds.
2012-07-23_10:17:10.02108 [19310.745812] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
2012-07-23_10:17:10.02166 [19310.745813] munin-update D ffff880401070fd0 0 207814 207809 103 0x00000000
2012-07-23_10:17:10.02225 [19310.745815] ffff88007ece3e08 0000000000000086 0000000000000000 ffff880401070fd0
2012-07-23_10:17:10.02283 [19310.745817] ffff88007ece3ec8 ffff880401070fd0 0000000100000010 ffff880401070fd0
2012-07-23_10:17:10.02341 [19310.745819] ffff880115f7f530 ffff880401071588 ffff88007ece3fd8 ffff88007ece3fd8
2012-07-23_10:17:10.02382 [19310.745821] Call Trace:
2012-07-23_10:17:10.02442 [19310.745823] [] __mutex_lock_slowpath+0x13e/0x180
2012-07-23_10:17:10.02500 [19310.745825] [] mutex_lock+0x2b/0x50
2012-07-23_10:17:10.02554 [19310.745827] [] do_unlinkat+0x96/0x1c0
2012-07-23_10:17:10.02617 [19310.745829] [] ? do_page_fault+0x3e/0xa0
2012-07-23_10:17:10.02673 [19310.745831] [] sys_unlink+0x16/0x20
2012-07-23_10:17:10.02735 [19310.745833] [] system_call_fastpath+0x16/0x1b
2012-07-23_10:19:09.68474 [19430.360861] md: stopping all md devices.
2012-07-23_10:19:10.35033 [19431.026137] ------------[ cut here ]------------
2012-07-23_10:19:10.35106 [19431.026868] kernel BUG at drivers/md/md.c:6657!
2012-07-23_10:19:10.35107 [19431.027587] invalid opcode: 0000 [#1] SMP
2012-07-23_10:19:10.35330 [19431.028307] last sysfs file: /sys/devices/pci0000:00/0000:00:1f.2/host0/target0:0:0/0:0:0:0/scsi_disk/0:0:0:0/manage_start_stop
2012-07-23_10:19:10.35332 [19431.029811] CPU 4
2012-07-23_10:19:10.35409 [19431.029822] Modules linked in: authenc(U) esp4(U) xfrm4_mode_tunnel(U) vzethdev(U) vznetdev(U) simfs(U) vzrst(U) vzcpt(U) nfs(U) lockd(U) fscache(U) nfs_acl(U) auth_rpcgss(U) sunrpc(U) vzdquota(U) vzmon(U) vzdev(U) xt_hl(U) tun(U) acpi_cpufreq(U) cpufreq_powersave(U) cpufreq_stats(U) cpufreq_conservative(U) cpufreq_ondemand(U) vzevent(U) xt_TCPMSS(U) act_police(U) cls_flow(U) cls_fw(U) cls_u32(U) sch_htb(U) sch_hfsc(U) sch_ingress(U) sch_sfq(U) xt_time(U) xt_connlimit(U) xt_realm(U) iptable_raw(U) xt_comment(U) xt_recent(U) xt_policy(U) ipt_ULOG(U) ipt_REJECT(U) ipt_REDIRECT(U) ipt_NETMAP(U) ipt_MASQUERADE(U) ipt_ECN(U) ipt_ecn(U) ipt_CLUSTERIP(U) ipt_ah(U) ipt_addrtype(U) nf_nat_tftp(U) nf_nat_snmp_basic(U) nf_conntrack_snmp(U) nf_nat_sip(U) nf_nat_pptp(U) nf_nat_proto_gre(U) nf_nat_irc(U) nf_nat_h323(U) nf_nat_ftp(U) nf_nat_amanda(U) ts_kmp(U) nf_conntrack_amanda(U) nf_conntrack_sane(U) nf_conntrack_tftp(U) nf_conntrack_sip(U) nf_conntrack_proto_sctp(U) nf_conntrack_pptp(U) nf_conntrack_proto_gre(U) nf_conntrack_netlink(U) nf_conntrack_netbios_ns(U) nf_conntrack_broadcast(U) nf_conntrack_irc(U) nf_conntrack_h323(U) nf_conntrack_ftp(U) xt_TPROXY(U) nf_tproxy_core(U) ip6_tables(U) nf_defrag_ipv6(U) xt_tcpmss(U) xt_pkttype(U) xt_physdev(U) xt_owner(U) xt_NFQUEUE(U) xt_NFLOG(U) nfnetlink_log(U) xt_multiport(U) xt_MARK(U) xt_mark(U) xt_mac(U) xt_limit(U) xt_length(U) xt_iprange(U) xt_helper(U) xt_hashlimit(U) xt_DSCP(U) xt_dscp(U) xt_dccp(U) xt_conntrack(U) xt_CONNMARK(U) xt_connmark(U) xt_CLASSIFY(U) ipt_LOG(U) xt_state(U) iptable_nat(U) nf_nat(U) nf_conntrack_ipv4(U) nf_defrag_ipv4(U) nf_conntrack(U) iptable_mangle(U) nfnetlink(U) deflate(U) ctr(U) twofish_x86_64(U) twofish_common(U) camellia(U) serpent(U) blowfish(U) cast5(U) des_generic(U) cbc(U) iptable_filter(U) ip_tables(U) aesni_intel(U) cryptd(U) aes_x86_64(U) aes_generic(U) xcbc(U) rmd160(U) sha256_generic(U) crypto_null(U) af_key(U) ip_gre(U) ipv6(U) ext3(U) jbd(U) netconsole(U) configfs(U) zfs(P)(U) zcommon(P)(U) znvpair(P)(U) zavl(P)(U) zunicode(P)(U) spl(U) zlib_deflate(U) i915(U) drm_kms_helper(U) drm(U) snd_pcsp(U) i2c_algo_bit(U) snd_pcm(U) snd_timer(U) snd(U) soundcore(U) video(U) i2c_i801(U) tpm_tis(U) i2c_core(U) tpm(U) tpm_bios(U) snd_page_alloc(U) output(U) ext4(U) mbcache(U) jbd2(U) dm_mod(U) freq_table(U) mperf(U) aacraid(U) 3w_9xxx(U) 3w_xxxx(U) raid10(U) raid456(U) async_raid6_recov(U) async_pq(U) raid6_pq(U) async_xor(U) xor(U) async_memcpy(U) async_tx(U) raid1(U) raid0(U) sata_nv(U) sata_sil(U) sata_via(U) sd_mod(U) crc_t10dif(U) r8169(U) mii(U) ahci(U) xhci_hcd(U) [last unloaded: scsi_wait_scan]
2012-07-23_10:19:10.37662 [19431.051971]
2012-07-23_10:19:10.37784 [19431.053129] Pid: 207556, comm: mysqld veid: 0 Tainted: P W ---------------- 2.6.32-042stab057.1-el6-openvz-zfsissue842 #1 042stab057 System manufacturer System Product Name/P8H67-M PRO
2012-07-23_10:19:10.37906 [19431.055557] RIP: 0010:[] [] md_write_start+0x1bb/0x1c0
2012-07-23_10:19:10.38153 [19431.056806] RSP: 0018:ffff880092d238e8 EFLAGS: 00010246
2012-07-23_10:19:10.38280 [19431.058053] RAX: 0000000000000001 RBX: ffff880439f37c00 RCX: 0000000000000001
2012-07-23_10:19:10.38406 [19431.059318] RDX: 0000000000000000 RSI: ffff8802493d1540 RDI: ffff880439f37c00
2012-07-23_10:19:10.38532 [19431.060580] RBP: ffff880092d23938 R08: 0000000000000246 R09: ffffe8ffffd02638
2012-07-23_10:19:10.38656 [19431.061832] R10: ffff8803b98af8f8 R11: 0000000000000000 R12: ffff88043a117600
2012-07-23_10:19:10.38780 [19431.063080] R13: ffff8802493d1540 R14: ffff880439f37c00 R15: 0000000000000001
2012-07-23_10:19:10.38924 [19431.064318] FS: 00007f51ffa40700(0000) GS:ffff88002c300000(0000) knlGS:0000000000000000
2012-07-23_10:19:10.39031 [19431.065572] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
2012-07-23_10:19:10.39158 [19431.066825] CR2: 00007f5294fd2000 CR3: 0000000418156000 CR4: 00000000000406e0
2012-07-23_10:19:10.39283 [19431.068085] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
2012-07-23_10:19:10.39408 [19431.069340] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
2012-07-23_10:19:10.39557 [19431.070592] Process mysqld (pid: 207556, veid: 0, threadinfo ffff880092d22000, task ffff88043b4ef210)
2012-07-23_10:19:10.39664 [19431.071869] Stack:
2012-07-23_10:19:10.39666 [19431.073137] ffff880092d23938 0000000000000286 ffff88043b4efbd0 ffff88043b4efbd0
2012-07-23_10:19:10.39798 [19431.073181] <0> ffffffff823c2920 ffffea00003895c0 ffff880092d23938 ffff880439f37c00
2012-07-23_10:19:10.39937 [19431.074500] <0> ffff88043a117600 ffff8802493d1540 ffff880092d23a08 ffffffffa009805a
2012-07-23_10:19:10.40190 [19431.077115] Call Trace:
2012-07-23_10:19:10.40324 [19431.078423] [] make_request+0x4a/0x800 [raid1]
2012-07-23_10:19:10.40457 [19431.079738] [] ? mempool_alloc_slab+0x15/0x20
2012-07-23_10:19:10.40583 [19431.081052] [] ? mempool_alloc+0x65/0x150
2012-07-23_10:19:10.40710 [19431.082340] [] ? throtl_find_tg+0x46/0x60
2012-07-23_10:19:10.40841 [19431.083607] [] md_make_request+0xd3/0x210
2012-07-23_10:19:10.40962 [19431.084879] [] generic_make_request+0x2b2/0x5c0
2012-07-23_10:19:10.41085 [19431.086129] [] submit_bio+0xf5/0x1a0
2012-07-23_10:19:10.41206 [19431.087357] [] ? __bio_clone+0x26/0x70
2012-07-23_10:19:10.41326 [19431.088571] [] ext4_io_submit+0x56/0x80 [ext4]
2012-07-23_10:19:10.41442 [19431.089761] [] mpage_da_submit_io+0x194/0x1d0 [ext4]
2012-07-23_10:19:10.41557 [19431.090931] [] ? jbd2_journal_start+0xb5/0x100 [jbd2]
2012-07-23_10:19:10.41671 [19431.092084] [] ext4_da_writepages+0x43c/0x690 [ext4]
2012-07-23_10:19:10.41782 [19431.093212] [] ? generic_file_aio_write+0xbe/0xe0
2012-07-23_10:19:10.41890 [19431.094318] [] ? do_sync_write+0xfa/0x140
2012-07-23_10:19:10.41996 [19431.095401] [] do_writepages+0x21/0x40
2012-07-23_10:19:10.42101 [19431.096458] [] __filemap_fdatawrite_range+0x5b/0x60
2012-07-23_10:19:10.42203 [19431.097502] [] filemap_write_and_wait_range+0x5a/0x90
2012-07-23_10:19:10.42304 [19431.098529] [] vfs_fsync_range+0xc0/0x1a0
2012-07-23_10:19:10.42401 [19431.099533] [] vfs_fsync+0x1d/0x20
2012-07-23_10:19:10.42496 [19431.100509] [] do_fsync+0x60/0xa0
2012-07-23_10:19:10.42590 [19431.101461] [] sys_fsync+0x10/0x20
2012-07-23_10:19:10.42693 [19431.102396] [] system_call_fastpath+0x16/0x1b
2012-07-23_10:19:10.42694 [19431.103308] Code: c7 83 c4 01 00 00 00 00 00 00 f0 80 4b 28 02 f0 80 4b 28 04 48 8b bb 50 01 00 00 41 bc 01 00 00 00 e8 ba 74 ff ff e9 5e ff ff ff <0f> 0b eb fe 90 55 48 89 e5 48 83 ec 20 48 89 5d e8 4c 89 65 f0
2012-07-23_10:19:10.42895 [19431.105389] RIP [] md_write_start+0x1bb/0x1c0
2012-07-23_10:19:10.43072 [19431.106321] RSP
2012-07-23_10:19:10.43527 [19431.110848] ---[ end trace 15a9384deac5864f ]---
2012-07-23_10:19:10.43617 [19431.111763] Kernel panic - not syncing: Fatal exception
2012-07-23_10:19:10.43819 [19431.112665] Pid: 207556, comm: mysqld veid: 0 Tainted: P D W ---------------- 2.6.32-042stab057.1-el6-openvz-zfsissue842 #1
2012-07-23_10:19:10.43886 [19431.114465] Call Trace:
2012-07-23_10:19:10.43975 [19431.115351] [] ? panic+0x78/0x143
2012-07-23_10:19:10.44063 [19431.116239] [] ? oops_end+0xe4/0x100
2012-07-23_10:19:10.44150 [19431.117121] [] ? die+0x5b/0x90
2012-07-23_10:19:10.44237 [19431.117994] [] ? do_trap+0xc4/0x170
2012-07-23_10:19:10.44323 [19431.118863] [] ? do_invalid_op+0x95/0xb0
2012-07-23_10:19:10.44409 [19431.119725] [] ? md_write_start+0x1bb/0x1c0
2012-07-23_10:19:10.44493 [19431.120579] [] ? swiotlb_map_page+0x7b/0xe0
2012-07-23_10:19:10.44578 [19431.121428] [] ? rtl8169_start_xmit+0x2df/0x520 [r8169]
2012-07-23_10:19:10.44663 [19431.122270] [] ? cpumask_next_and+0x29/0x50
2012-07-23_10:19:10.44747 [19431.123107] [] ? invalid_op+0x1b/0x20
2012-07-23_10:19:10.44829 [19431.123945] [] ? md_write_start+0x1bb/0x1c0
2012-07-23_10:19:10.44911 [19431.124772] [] ? load_balance_fair+0x1c8/0x2a0
2012-07-23_10:19:10.44991 [19431.125589] [] ? make_request+0x4a/0x800 [raid1]
2012-07-23_10:19:10.45070 [19431.126394] [] ? mempool_alloc_slab+0x15/0x20
2012-07-23_10:19:10.45149 [19431.127189] [] ? mempool_alloc+0x65/0x150
2012-07-23_10:19:10.45227 [19431.127973] [] ? throtl_find_tg+0x46/0x60
2012-07-23_10:19:10.45301 [19431.128747] [] ? md_make_request+0xd3/0x210
2012-07-23_10:19:10.45380 [19431.129517] [] ? generic_make_request+0x2b2/0x5c0
2012-07-23_10:19:10.45458 [19431.130286] [] ? submit_bio+0xf5/0x1a0
2012-07-23_10:19:10.45534 [19431.131057] [] ? __bio_clone+0x26/0x70
2012-07-23_10:19:10.45617 [19431.131836] [] ? ext4_io_submit+0x56/0x80 [ext4]
2012-07-23_10:19:10.45690 [19431.132620] [] ? mpage_da_submit_io+0x194/0x1d0 [ext4]
2012-07-23_10:19:10.45770 [19431.133406] [] ? jbd2_journal_start+0xb5/0x100 [jbd2]
2012-07-23_10:19:10.45853 [19431.134194] [] ? ext4_da_writepages+0x43c/0x690 [ext4]
2012-07-23_10:19:10.45928 [19431.134973] [] ? generic_file_aio_write+0xbe/0xe0
2012-07-23_10:19:10.46004 [19431.135757] [] ? do_sync_write+0xfa/0x140
2012-07-23_10:19:10.46083 [19431.136537] [] ? do_writepages+0x21/0x40
2012-07-23_10:19:10.46161 [19431.137313] [] ? __filemap_fdatawrite_range+0x5b/0x60
2012-07-23_10:19:10.46239 [19431.138093] [] ? filemap_write_and_wait_range+0x5a/0x90
2012-07-23_10:19:10.46316 [19431.138871] [] ? vfs_fsync_range+0xc0/0x1a0
2012-07-23_10:19:10.46394 [19431.139651] [] ? vfs_fsync+0x1d/0x20
2012-07-23_10:19:10.46471 [19431.140423] [] ? do_fsync+0x60/0xa0
2012-07-23_10:19:10.46546 [19431.141187] [] ? sys_fsync+0x10/0x20
2012-07-23_10:19:10.46624 [19431.141941] [] ? system_call_fastpath+0x16/0x1b
2012-07-23_10:19:10.46701 [19431.142703] panic occurred, switching back to text console
2012-07-23_10:19:10.46715 [19431.143618] Rebooting in 20 seconds..
2012-07-23_10:19:30.46689 [19451.131690] ACPI MEMORY or I/O RESET_REG.

@behlendorf
Copy link
Contributor

How exactly is this system configured? While the original stacks are clearly from ZFS the last one which panic'ed the system came from the MD driver. Apparently that device somehow was marked read-only.

void md_write_start(mddev_t *mddev, struct bio *bi)
{
        int did_change = 0;
        if (bio_data_dir(bi) != WRITE)
                return;

>>>     BUG_ON(mddev->ro == 1);
        ...

@ryao
Copy link
Contributor

ryao commented Jul 23, 2012

@alexclear Why are you using MD RAID? ZFS is meant to handle that for you.

@alexclear
Copy link
Author

@behlendorf sda1-sda3 and sdb1-sdb3 are parts of MD RAID1 devices, sda4 and sdb4 form a zpool. So, /, /boot and swap use MD devices and the ZFS partition is mounted at /home.

@alexclear
Copy link
Author

@ryao This is a dedicated box on a cheap hosting and I didn't have much control over it on OS installation step, so I just converted one of predefined MD RAIDs to the zpool mirror.

@alexclear
Copy link
Author

Latest crashes seem to share a common pattern: http://prntscr.com/ckf4s (well, the system does not crash until I do "reboot -n -f", everything is just stuck in the D-state). So I guess limiting max ARC size to 2G should help. Anyway this bug is not related to my original problems.

@alexclear
Copy link
Author

Putting 2G cap on ARC has not helped, it seems that memory leaks somehow.

@behlendorf
Copy link
Contributor

@alexclear Numerous memory management fixes have been merged in to the master source. I'd be interested to hear if your able to reproduce your hung task issue with the latest code.

@ryao
Copy link
Contributor

ryao commented Sep 30, 2012

@alexclear Are you using a NUMA system?

@behlendorf
Copy link
Contributor

Where does this stand? Is it reproducible with the latest code?

@alexclear
Copy link
Author

@behlendorf This one can be closed, I haven't seen it for months, thank you!

pcd1193182 pushed a commit to pcd1193182/zfs that referenced this issue Sep 26, 2023
…nzfs#842)

Bumps [tokio-stream](https://github.com/tokio-rs/tokio) from 0.1.12 to 0.1.14.
- [Release notes](https://github.com/tokio-rs/tokio/releases)
- [Changelog](https://github.com/tokio-rs/tokio/blob/tokio-0.1.14/CHANGELOG.md)
- [Commits](tokio-rs/tokio@tokio-stream-0.1.12...tokio-0.1.14)

---
updated-dependencies:
- dependency-name: tokio-stream
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants