-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kernel dump on heavy load #1066
Comments
Those backtraces look like diagnostic messages that could have been written by the kernel hang timer rather than actual kernel issues. Did this system crash or did it just output those messages? It looks like a few disks had errors, which might have caused them to become unresponsive while they did error handling. Also, it appears that you have zero redundancy, which is why ZFS was unable to recover. |
I agree, the stacks are just diagnostic and are almost certainly due to slow/failing disk I/O. |
Really strange, because the harddisk's all the same. Here is a short structure:
I cant use single disks on this system, because its a emergency / backup system if the main server goes down and some virtual machines must be work on it. I couldnt saw any lags on this system for this time, but that must mean nothing. Its possible to increase the wait time? |
My System ist updated to version: ZFS: Loaded module v0.6.0.86-rc12, ZFS pool version 28, ZFS filesystem version 5 Ive got dump's again on haevy load: But... the system doesnt crash/hang anymore! Thats nice! Any hints? |
@StuFu Can you try manually undefining HAVE_SHRINK in your zfs_config.h when building the source. This will prevent us from calling
|
Closing as a duplicate of #790 |
Hi,
again... other server, haevy load with rsync and rm:
--- cut ---
Oct 23 03:13:21 fshh5 kernel: [318473.583980] zfs_iput_taskq/ D ffffffff8180d2e0 0 543 2 0x00000000
Oct 23 03:13:21 fshh5 kernel: [318473.583986] ffff88022c087890 0000000000000046 0000000000000000 0000000000000000
Oct 23 03:13:21 fshh5 kernel: [318473.583991] ffff88022c087fd8 ffff88022c087fd8 ffff88022c087fd8 0000000000013980
Oct 23 03:13:21 fshh5 kernel: [318473.583995] ffffffff81c14440 ffff88022c481700 ffff88022c0878a0 ffff88018044f5c8
Oct 23 03:13:21 fshh5 kernel: [318473.583998] Call Trace:
Oct 23 03:13:21 fshh5 kernel: [318473.584040] [] schedule+0x29/0x70
Oct 23 03:13:21 fshh5 kernel: [318473.584168] [] cv_wait_common+0x98/0x190 [spl]
Oct 23 03:13:21 fshh5 kernel: [318473.584190] [] ? add_wait_queue+0x60/0x60
Oct 23 03:13:21 fshh5 kernel: [318473.584199] [] __cv_wait+0x13/0x20 [spl]
Oct 23 03:13:21 fshh5 kernel: [318473.584355] [] zio_wait+0xfb/0x170 [zfs]
Oct 23 03:13:21 fshh5 kernel: [318473.584377] [] dbuf_read+0x337/0x840 [zfs]
Oct 23 03:13:21 fshh5 kernel: [318473.584404] [] dmu_buf_hold+0x107/0x1d0 [zfs]
Oct 23 03:13:21 fshh5 kernel: [318473.584434] [] zap_get_leaf_byblk+0x4f/0x2d0 [zfs]
Oct 23 03:13:21 fshh5 kernel: [318473.584464] [] ? dnode_hold_impl+0x2d9/0x5b0 [zfs]
Oct 23 03:13:21 fshh5 kernel: [318473.584497] [] zap_deref_leaf+0x6d/0x80 [zfs]
Oct 23 03:13:21 fshh5 kernel: [318473.584528] [] fzap_cursor_retrieve+0xdf/0x280 [zfs]
Oct 23 03:13:21 fshh5 kernel: [318473.584532] [] ? _raw_spin_lock+0xe/0x20
Oct 23 03:13:21 fshh5 kernel: [318473.584549] [] ? iput_final+0x121/0x210
Oct 23 03:13:21 fshh5 kernel: [318473.584585] [] zap_cursor_retrieve+0x6b/0x320 [zfs]
Oct 23 03:13:21 fshh5 kernel: [318473.584618] [] zfs_unlinked_drain+0x5f/0x120 [zfs]
Oct 23 03:13:21 fshh5 kernel: [318473.584628] [] ? dequeue_entity+0x111/0x200
Oct 23 03:13:21 fshh5 kernel: [318473.584636] [] ? kmem_free_debug+0x4b/0x150 [spl]
Oct 23 03:13:21 fshh5 kernel: [318473.584645] [] ? kfree+0x114/0x140
Oct 23 03:13:21 fshh5 kernel: [318473.584656] [] ? kmem_free_debug+0x4b/0x150 [spl]
Oct 23 03:13:21 fshh5 kernel: [318473.584664] [] taskq_thread+0x23b/0x590 [spl]
Oct 23 03:13:21 fshh5 kernel: [318473.584673] [] ? try_to_wake_up+0x200/0x200
Oct 23 03:13:21 fshh5 kernel: [318473.584683] [] ? task_done+0x140/0x140 [spl]
Oct 23 03:13:21 fshh5 kernel: [318473.584687] [] kthread+0x93/0xa0
Oct 23 03:13:21 fshh5 kernel: [318473.584697] [] kernel_thread_helper+0x4/0x10
Oct 23 03:13:21 fshh5 kernel: [318473.584700] [] ? kthread_freezable_should_stop+0x70/0x70
Oct 23 03:13:21 fshh5 kernel: [318473.584703] [] ? gs_change+0x13/0x13
Oct 23 03:13:21 fshh5 kernel: [318473.584974] txg_sync D 0000000000000003 0 545 2 0x00000000
Oct 23 03:13:21 fshh5 kernel: [318473.584977] ffff8802291f9bd0 0000000000000046 ffff8802291f9b70 ffffffff8108b482
Oct 23 03:13:21 fshh5 kernel: [318473.584981] ffff8802291f9fd8 ffff8802291f9fd8 ffff8802291f9fd8 0000000000013980
Oct 23 03:13:21 fshh5 kernel: [318473.584985] ffff880232ee4500 ffff88022c484500 ffff8802291f9be0 ffff88017e050b28
Oct 23 03:13:21 fshh5 kernel: [318473.584988] Call Trace:
Oct 23 03:13:21 fshh5 kernel: [318473.584992] [] ? default_wake_function+0x12/0x20
Oct 23 03:13:21 fshh5 kernel: [318473.584995] [] schedule+0x29/0x70
Oct 23 03:13:21 fshh5 kernel: [318473.585005] [] cv_wait_common+0x98/0x190 [spl]
Oct 23 03:13:21 fshh5 kernel: [318473.585007] [] ? add_wait_queue+0x60/0x60
Oct 23 03:13:21 fshh5 kernel: [318473.585015] [] __cv_wait+0x13/0x20 [spl]
Oct 23 03:13:21 fshh5 kernel: [318473.585043] [] zio_wait+0xfb/0x170 [zfs]
Oct 23 03:13:21 fshh5 kernel: [318473.585074] [] dsl_pool_sync+0xca/0x450 [zfs]
Oct 23 03:13:21 fshh5 kernel: [318473.585103] [] spa_sync+0x38e/0xa00 [zfs]
Oct 23 03:13:21 fshh5 kernel: [318473.585132] [] txg_sync_thread+0x286/0x450 [zfs]
Oct 23 03:13:21 fshh5 kernel: [318473.585162] [] ? txg_init+0x250/0x250 [zfs]
Oct 23 03:13:21 fshh5 kernel: [318473.585169] [] thread_generic_wrapper+0x78/0x90 [spl]
Oct 23 03:13:21 fshh5 kernel: [318473.585176] [] ? __thread_create+0x310/0x310 [spl]
Oct 23 03:13:21 fshh5 kernel: [318473.585179] [] kthread+0x93/0xa0
Oct 23 03:13:21 fshh5 kernel: [318473.585183] [] kernel_thread_helper+0x4/0x10
Oct 23 03:13:21 fshh5 kernel: [318473.585186] [] ? kthread_freezable_should_stop+0x70/0x70
Oct 23 03:13:21 fshh5 kernel: [318473.585188] [] ? gs_change+0x13/0x13
Oct 23 03:13:21 fshh5 kernel: [318473.585490] rm D ffffffff8180d2e0 0 13171 13161 0x00000000
Oct 23 03:13:21 fshh5 kernel: [318473.585494] ffff8801d8ad3758 0000000000000082 0000000000000000 0000000000000000
Oct 23 03:13:21 fshh5 kernel: [318473.585498] ffff8801d8ad3fd8 ffff8801d8ad3fd8 ffff8801d8ad3fd8 0000000000013980
Oct 23 03:13:21 fshh5 kernel: [318473.585502] ffff880234160000 ffff880220164500 ffff8801d8ad3768 ffff8801b03cd2e8
Oct 23 03:13:21 fshh5 kernel: [318473.585505] Call Trace:
Oct 23 03:13:21 fshh5 kernel: [318473.585509] [] schedule+0x29/0x70
Oct 23 03:13:21 fshh5 kernel: [318473.585517] [] cv_wait_common+0x98/0x190 [spl]
Oct 23 03:13:21 fshh5 kernel: [318473.585520] [] ? add_wait_queue+0x60/0x60
Oct 23 03:13:21 fshh5 kernel: [318473.585528] [] __cv_wait+0x13/0x20 [spl]
Oct 23 03:13:21 fshh5 kernel: [318473.585556] [] zio_wait+0xfb/0x170 [zfs]
Oct 23 03:13:21 fshh5 kernel: [318473.585575] [] dbuf_read+0x337/0x840 [zfs]
Oct 23 03:13:21 fshh5 kernel: [318473.585595] [] dmu_buf_hold+0x107/0x1d0 [zfs]
Oct 23 03:13:21 fshh5 kernel: [318473.585628] [] zap_get_leaf_byblk+0x4f/0x2d0 [zfs]
Oct 23 03:13:21 fshh5 kernel: [318473.585632] [] ? __kmalloc+0x14d/0x1a0
Oct 23 03:13:21 fshh5 kernel: [318473.585661] [] zap_deref_leaf+0x6d/0x80 [zfs]
Oct 23 03:13:21 fshh5 kernel: [318473.585690] [] fzap_add_cd+0x46/0x110 [zfs]
Oct 23 03:13:21 fshh5 kernel: [318473.585720] [] fzap_add+0x71/0x80 [zfs]
Oct 23 03:13:21 fshh5 kernel: [318473.585749] [] zap_add+0x11d/0x1a0 [zfs]
Oct 23 03:13:21 fshh5 kernel: [318473.585768] [] ? snprintf+0x34/0x40
Oct 23 03:13:21 fshh5 kernel: [318473.585798] [] zap_add_int+0x74/0xa0 [zfs]
Oct 23 03:13:21 fshh5 kernel: [318473.585826] [] ? sa_bulk_update_impl+0x6d/0x120 [zfs]
Oct 23 03:13:21 fshh5 kernel: [318473.585860] [] zfs_unlinked_add+0x47/0xb0 [zfs]
Oct 23 03:13:21 fshh5 kernel: [318473.585889] [] zfs_link_destroy+0x491/0x4c0 [zfs]
Oct 23 03:13:21 fshh5 kernel: [318473.585919] [] zfs_rmdir+0x5ed/0x820 [zfs]
Oct 23 03:13:21 fshh5 kernel: [318473.585922] [] ? _raw_spin_lock+0xe/0x20
Oct 23 03:13:21 fshh5 kernel: [318473.585925] [] ? __d_lookup+0x125/0x170
Oct 23 03:13:21 fshh5 kernel: [318473.585957] [] zpl_rmdir+0x4b/0x70 [zfs]
Oct 23 03:13:21 fshh5 kernel: [318473.585962] [] vfs_rmdir.part.26+0xb7/0x110
Oct 23 03:13:21 fshh5 kernel: [318473.585965] [] vfs_rmdir+0x3f/0x60
Oct 23 03:13:21 fshh5 kernel: [318473.585968] [] do_rmdir+0x12a/0x130
Oct 23 03:13:21 fshh5 kernel: [318473.585975] [] ? fput+0x25/0x30
Oct 23 03:13:21 fshh5 kernel: [318473.585985] [] ? filp_close+0x66/0x90
Oct 23 03:13:21 fshh5 kernel: [318473.585988] [] sys_unlinkat+0x2d/0x40
Oct 23 03:13:21 fshh5 kernel: [318473.585992] [] system_call_fastpath+0x1a/0x1f
Oct 23 03:13:21 fshh5 kernel: [318473.586260] rsync D 0000000000000007 0 21298 21292 0x00000000
Oct 23 03:13:21 fshh5 kernel: [318473.586264] ffff88021dd45b88 0000000000000086 0000000000000000 0000000000000000
Oct 23 03:13:21 fshh5 kernel: [318473.586267] ffff88021dd45fd8 ffff88021dd45fd8 ffff88021dd45fd8 0000000000013980
Oct 23 03:13:21 fshh5 kernel: [318473.586271] ffff880232f15c00 ffff88023420dc00 ffff88021dd45b98 ffff8801ee3ecf08
Oct 23 03:13:21 fshh5 kernel: [318473.586274] Call Trace:
Oct 23 03:13:21 fshh5 kernel: [318473.586278] [] schedule+0x29/0x70
Oct 23 03:13:21 fshh5 kernel: [318473.586287] [] cv_wait_common+0x98/0x190 [spl]
Oct 23 03:13:21 fshh5 kernel: [318473.586290] [] ? add_wait_queue+0x60/0x60
Oct 23 03:13:21 fshh5 kernel: [318473.586297] [] __cv_wait+0x13/0x20 [spl]
Oct 23 03:13:21 fshh5 kernel: [318473.586325] [] zio_wait+0xfb/0x170 [zfs]
Oct 23 03:13:21 fshh5 kernel: [318473.586347] [] dmu_buf_hold_array_by_dnode+0x207/0x550 [zfs]
Oct 23 03:13:21 fshh5 kernel: [318473.586369] [] dmu_buf_hold_array+0x65/0x90 [zfs]
Oct 23 03:13:21 fshh5 kernel: [318473.586379] [] ? avl_add+0x38/0x50 [zavl]
Oct 23 03:13:21 fshh5 kernel: [318473.586405] [] dmu_read_uio+0x41/0xd0 [zfs]
Oct 23 03:13:21 fshh5 kernel: [318473.586418] [] ? mutex_lock+0x1d/0x50
Oct 23 03:13:21 fshh5 kernel: [318473.586447] [] zfs_read+0x168/0x4a0 [zfs]
Oct 23 03:13:21 fshh5 kernel: [318473.586451] [] ? putname+0x35/0x50
Oct 23 03:13:21 fshh5 kernel: [318473.586479] [] zpl_read_common+0x52/0x80 [zfs]
Oct 23 03:13:21 fshh5 kernel: [318473.586506] [] zpl_read+0x68/0xa0 [zfs]
Oct 23 03:13:21 fshh5 kernel: [318473.586510] [] vfs_read+0xb0/0x180
Oct 23 03:13:21 fshh5 kernel: [318473.586512] [] sys_read+0x4a/0x90
Oct 23 03:13:21 fshh5 kernel: [318473.586515] [] system_call_fastpath+0x1a/0x1f
--- cut ---
pool: fshh5
state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: http://zfsonlinux.org/msg/ZFS-8000-8A
scan: scrub repaired 0 in 9h13m with 0 errors on Thu Oct 18 01:29:23 2012
config:
errors: Permanent errors have been detected in the following files:
My System:
Ubuntu 3.5.4-030504-generic under vmware
Lastest ZFS / SPL
There are no problems from the vmware log - no latency logged or anything else.
Any hint for me?
The text was updated successfully, but these errors were encountered: