-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
txg_sync hangs zpool or zfs commands *without* maxing out CPU #1640
Comments
I got something similar to this this morning. Unlike all the other times I've had it happen, this time there were no rollbacks involved - just a zfs send | zfs receive on the same host. The process that bit it was zfs receive; the pool I was sending from was fine but the one I was receiving to was locked hard. The system was completely responsive - including a VM on the filesystem I was sending from - but system load was > 200 (on an 8-core host) and I had to do the magic echo b > /proc/sysrq-trigger thing since anything touching the receiving system (even a zfs list) would hang irretrievably. |
FWIW - my zfs send | zfs receive was sending a recursive stream (-R) to create a new filesystem on the target pool:
Upon doing the magic echo b > /proc/sysrq-trigger reboot, the backup/images target filesystem had been created successfully and did have the first snapshot of the lot intact and correct on it. None of the remaining snapshots were present, so it appears to have locked up immediately after successfully applying the first snapshot of the stream. |
@jrssystemsnet If it occurs again can you grab back traces with |
Closing as stale. If anyone still hitting this let us know and we'll reopen it. |
This issue looks similar to the following closed / investigated issues, with one difference. My CPUs are idle. I can reproduce this by simply unmounting my pool, disconnecting the disks and powering them down, then trying to remount it all again, hence I open a new issue as this might be a different angle or a different problem entirely.
The existing similar issues are:
#1070
#1101
#1600
I am using latest stable ubuntu-zfs release from the ppa, on August 9th 2013. I am using ubuntu 13.04.
My trace is:
Aug 9 17:23:53 pmb-LinuxBookAir kernel: [23723.862355] INFO: task txg_sync:4738 blocked for more than 120 seconds.
Aug 9 17:23:53 pmb-LinuxBookAir kernel: [23723.862358] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 9 17:23:53 pmb-LinuxBookAir kernel: [23723.862359] txg_sync D ffff88016fa53f40 0 4738 2 0x00000000
Aug 9 17:23:53 pmb-LinuxBookAir kernel: [23723.862362] ffff88007af2bb98 0000000000000046 ffff880034511740 ffff88007af2bfd8
Aug 9 17:23:53 pmb-LinuxBookAir kernel: [23723.862365] ffff88007af2bfd8 ffff88007af2bfd8 ffff880076bd1740 ffff880034511740
Aug 9 17:23:53 pmb-LinuxBookAir kernel: [23723.862368] ffff880034511740 ffff88016fa547f8 ffff880059d294b0 0000000000000001
Aug 9 17:23:53 pmb-LinuxBookAir kernel: [23723.862370] Call Trace:
Aug 9 17:23:53 pmb-LinuxBookAir kernel: [23723.862377] [] schedule+0x29/0x70
Aug 9 17:23:53 pmb-LinuxBookAir kernel: [23723.862379] [] io_schedule+0x8f/0xd0
Aug 9 17:23:53 pmb-LinuxBookAir kernel: [23723.862397] [] cv_wait_common+0xa8/0x1b0 [spl]
Aug 9 17:23:53 pmb-LinuxBookAir kernel: [23723.862400] [] ? finish_wait+0x80/0x80
Aug 9 17:23:53 pmb-LinuxBookAir kernel: [23723.862407] [] __cv_wait_io+0x18/0x20 [spl]
Aug 9 17:23:53 pmb-LinuxBookAir kernel: [23723.862431] [] zio_wait+0x103/0x1a0 [zfs]
Aug 9 17:23:53 pmb-LinuxBookAir kernel: [23723.862448] [] dsl_pool_sync+0x348/0x5d0 [zfs]
Aug 9 17:23:53 pmb-LinuxBookAir kernel: [23723.862466] [] spa_sync+0x3e8/0xa60 [zfs]
Aug 9 17:23:53 pmb-LinuxBookAir kernel: [23723.862470] [] ? ktime_get_ts+0x48/0xe0
Aug 9 17:23:53 pmb-LinuxBookAir kernel: [23723.862488] [] txg_sync_thread+0x323/0x590 [zfs]
Aug 9 17:23:53 pmb-LinuxBookAir kernel: [23723.862506] [] ? txg_init+0x250/0x250 [zfs]
Aug 9 17:23:53 pmb-LinuxBookAir kernel: [23723.862512] [] thread_generic_wrapper+0x78/0x90 [spl]
Aug 9 17:23:53 pmb-LinuxBookAir kernel: [23723.862518] [] ? __thread_create+0x340/0x340 [spl]
Aug 9 17:23:53 pmb-LinuxBookAir kernel: [23723.862519] [] kthread+0xc0/0xd0
Aug 9 17:23:53 pmb-LinuxBookAir kernel: [23723.862521] [] ? kthread_create_on_node+0x120/0x120
Aug 9 17:23:53 pmb-LinuxBookAir kernel: [23723.862524] [] ret_from_fork+0x7c/0xb0
Aug 9 17:23:53 pmb-LinuxBookAir kernel: [23723.862525] [] ? kthread_create_on_node+0x120/0x120
Aug 9 17:23:53 pmb-LinuxBookAir kernel: [23723.862528] INFO: task mount.zfs:20919 blocked for more than 120 seconds.
Aug 9 17:23:53 pmb-LinuxBookAir kernel: [23723.862529] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 9 17:23:53 pmb-LinuxBookAir kernel: [23723.862530] mount.zfs D ffff88016fa13f40 0 20919 1 0x00000004
Aug 9 17:23:53 pmb-LinuxBookAir kernel: [23723.862532] ffff88012ab95758 0000000000000082 ffff880166685d00 ffff88012ab95fd8
Aug 9 17:23:53 pmb-LinuxBookAir kernel: [23723.862534] ffff88012ab95fd8 ffff88012ab95fd8 ffff880076ad8000 ffff880166685d00
Aug 9 17:23:53 pmb-LinuxBookAir kernel: [23723.862536] ffff880166685d00 ffff88016fa147f8 ffff8800028977e0 0000000000000001
Aug 9 17:23:53 pmb-LinuxBookAir kernel: [23723.862538] Call Trace:
Aug 9 17:23:53 pmb-LinuxBookAir kernel: [23723.862541] [] schedule+0x29/0x70
Aug 9 17:23:53 pmb-LinuxBookAir kernel: [23723.862543] [] io_schedule+0x8f/0xd0
Aug 9 17:23:53 pmb-LinuxBookAir kernel: [23723.862549] [] cv_wait_common+0xa8/0x1b0 [spl]
Aug 9 17:23:53 pmb-LinuxBookAir kernel: [23723.862551] [] ? finish_wait+0x80/0x80
Aug 9 17:23:53 pmb-LinuxBookAir kernel: [23723.862557] [] __cv_wait_io+0x18/0x20 [spl]
Aug 9 17:23:53 pmb-LinuxBookAir kernel: [23723.862574] [] zio_wait+0x103/0x1a0 [zfs]
Aug 9 17:23:53 pmb-LinuxBookAir kernel: [23723.862586] [] dbuf_read+0x314/0x7e0 [zfs]
Aug 9 17:23:53 pmb-LinuxBookAir kernel: [23723.862598] [] __dbuf_hold_impl+0x3cc/0x450 [zfs]
Aug 9 17:23:53 pmb-LinuxBookAir kernel: [23723.862609] [] __dbuf_hold_impl+0x16c/0x450 [zfs]
Aug 9 17:23:53 pmb-LinuxBookAir kernel: [23723.862620] [] __dbuf_hold_impl+0x16c/0x450 [zfs]
Aug 9 17:23:53 pmb-LinuxBookAir kernel: [23723.862631] [] __dbuf_hold_impl+0x16c/0x450 [zfs]
Aug 9 17:23:53 pmb-LinuxBookAir kernel: [23723.862642] [] __dbuf_hold_impl+0x16c/0x450 [zfs]
Aug 9 17:23:53 pmb-LinuxBookAir kernel: [23723.862653] [] __dbuf_hold_impl+0x16c/0x450 [zfs]
Aug 9 17:23:53 pmb-LinuxBookAir kernel: [23723.862665] [] dbuf_hold_impl+0x8d/0xc0 [zfs]
Aug 9 17:23:53 pmb-LinuxBookAir kernel: [23723.862676] [] dbuf_hold+0x20/0x40 [zfs]
Aug 9 17:23:53 pmb-LinuxBookAir kernel: [23723.862691] [] dnode_hold_impl+0x2d0/0x5f0 [zfs]
Aug 9 17:23:53 pmb-LinuxBookAir kernel: [23723.862695] [] ? __kmalloc+0x13d/0x170
Aug 9 17:23:53 pmb-LinuxBookAir kernel: [23723.862700] [] ? kmem_alloc_debug+0x96/0x3b0 [spl]
Aug 9 17:23:53 pmb-LinuxBookAir kernel: [23723.862714] [] dnode_hold+0x19/0x20 [zfs]
Aug 9 17:23:53 pmb-LinuxBookAir kernel: [23723.862726] [] dmu_buf_hold+0x4a/0x1b0 [zfs]
Aug 9 17:23:53 pmb-LinuxBookAir kernel: [23723.862729] [] ? mutex_lock+0x1d/0x50
Aug 9 17:23:53 pmb-LinuxBookAir kernel: [23723.862747] [] zap_lockdir+0x5a/0x760 [zfs]
Aug 9 17:23:53 pmb-LinuxBookAir kernel: [23723.862764] [] ? zrl_init+0x36/0x40 [zfs]
Aug 9 17:23:53 pmb-LinuxBookAir kernel: [23723.862766] [] ? mutex_lock+0x1d/0x50
Aug 9 17:23:53 pmb-LinuxBookAir kernel: [23723.862784] [] zap_lookup_norm+0x4a/0x190 [zfs]
Aug 9 17:23:53 pmb-LinuxBookAir kernel: [23723.862801] [] zap_lookup+0x33/0x40 [zfs]
Aug 9 17:23:53 pmb-LinuxBookAir kernel: [23723.862818] [] zfs_get_zplprop+0x57/0xc0 [zfs]
Aug 9 17:23:53 pmb-LinuxBookAir kernel: [23723.862834] [] zfs_sb_create+0xa6/0x560 [zfs]
Aug 9 17:23:53 pmb-LinuxBookAir kernel: [23723.862836] [] ? down_write+0x12/0x40
Aug 9 17:23:53 pmb-LinuxBookAir kernel: [23723.862852] [] zfs_domount+0x2f/0x270 [zfs]
Aug 9 17:23:53 pmb-LinuxBookAir kernel: [23723.862855] [] ? get_anon_bdev+0x110/0x110
Aug 9 17:23:53 pmb-LinuxBookAir kernel: [23723.862871] [] ? zpl_mount+0x30/0x30 [zfs]
Aug 9 17:23:53 pmb-LinuxBookAir kernel: [23723.862886] [] zpl_fill_super+0xe/0x20 [zfs]
Aug 9 17:23:53 pmb-LinuxBookAir kernel: [23723.862888] [] mount_nodev+0x56/0xb0
Aug 9 17:23:53 pmb-LinuxBookAir kernel: [23723.862904] [] zpl_mount+0x25/0x30 [zfs]
Aug 9 17:23:53 pmb-LinuxBookAir kernel: [23723.862906] [] mount_fs+0x43/0x1b0
Aug 9 17:23:53 pmb-LinuxBookAir kernel: [23723.862909] [] vfs_kern_mount+0x74/0x110
Aug 9 17:23:53 pmb-LinuxBookAir kernel: [23723.862912] [] do_mount+0x21f/0xac0
Aug 9 17:23:53 pmb-LinuxBookAir kernel: [23723.862915] [] ? __get_free_pages+0xe/0x50
Aug 9 17:23:53 pmb-LinuxBookAir kernel: [23723.862917] [] ? copy_mount_options+0x36/0x170
Aug 9 17:23:53 pmb-LinuxBookAir kernel: [23723.862919] [] sys_mount+0x8e/0xe0
Aug 9 17:23:53 pmb-LinuxBookAir kernel: [23723.862921] [] system_call_fastpath+0x1a/0x1f
Aug 9 17:25:53 pmb-LinuxBookAir kernel: [23843.675395] INFO: task txg_sync:4738 blocked for more than 120 seconds.
Aug 9 17:25:53 pmb-LinuxBookAir kernel: [23843.675399] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 9 17:25:53 pmb-LinuxBookAir kernel: [23843.675400] txg_sync D ffff88016fa53f40 0 4738 2 0x00000000
Aug 9 17:25:53 pmb-LinuxBookAir kernel: [23843.675404] ffff88007af2bb98 0000000000000046 ffff880034511740 ffff88007af2bfd8
Aug 9 17:25:53 pmb-LinuxBookAir kernel: [23843.675407] ffff88007af2bfd8 ffff88007af2bfd8 ffff880076bd1740 ffff880034511740
Aug 9 17:25:53 pmb-LinuxBookAir kernel: [23843.675409] ffff880034511740 ffff88016fa547f8 ffff880059d294b0 0000000000000001
Aug 9 17:25:53 pmb-LinuxBookAir kernel: [23843.675412] Call Trace:
Aug 9 17:25:53 pmb-LinuxBookAir kernel: [23843.675419] [] schedule+0x29/0x70
Aug 9 17:25:53 pmb-LinuxBookAir kernel: [23843.675422] [] io_schedule+0x8f/0xd0
Aug 9 17:25:53 pmb-LinuxBookAir kernel: [23843.675443] [] cv_wait_common+0xa8/0x1b0 [spl]
Aug 9 17:25:53 pmb-LinuxBookAir kernel: [23843.675447] [] ? finish_wait+0x80/0x80
Aug 9 17:25:53 pmb-LinuxBookAir kernel: [23843.675454] [] __cv_wait_io+0x18/0x20 [spl]
Aug 9 17:25:53 pmb-LinuxBookAir kernel: [23843.675482] [] zio_wait+0x103/0x1a0 [zfs]
Aug 9 17:25:53 pmb-LinuxBookAir kernel: [23843.675501] [] dsl_pool_sync+0x348/0x5d0 [zfs]
Aug 9 17:25:53 pmb-LinuxBookAir kernel: [23843.675522] [] spa_sync+0x3e8/0xa60 [zfs]
Aug 9 17:25:53 pmb-LinuxBookAir kernel: [23843.675525] [] ? ktime_get_ts+0x48/0xe0
Aug 9 17:25:53 pmb-LinuxBookAir kernel: [23843.675546] [] txg_sync_thread+0x323/0x590 [zfs]
Aug 9 17:25:53 pmb-LinuxBookAir kernel: [23843.675567] [] ? txg_init+0x250/0x250 [zfs]
Aug 9 17:25:53 pmb-LinuxBookAir kernel: [23843.675573] [] thread_generic_wrapper+0x78/0x90 [spl]
Aug 9 17:25:53 pmb-LinuxBookAir kernel: [23843.675579] [] ? __thread_create+0x340/0x340 [spl]
Aug 9 17:25:53 pmb-LinuxBookAir kernel: [23843.675582] [] kthread+0xc0/0xd0
Aug 9 17:25:53 pmb-LinuxBookAir kernel: [23843.675584] [] ? kthread_create_on_node+0x120/0x120
Aug 9 17:25:53 pmb-LinuxBookAir kernel: [23843.675586] [] ret_from_fork+0x7c/0xb0
Aug 9 17:25:53 pmb-LinuxBookAir kernel: [23843.675588] [] ? kthread_create_on_node+0x120/0x120
Aug 9 17:25:53 pmb-LinuxBookAir kernel: [23843.675591] INFO: task mount.zfs:20919 blocked for more than 120 seconds.
Aug 9 17:25:53 pmb-LinuxBookAir kernel: [23843.675592] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 9 17:25:53 pmb-LinuxBookAir kernel: [23843.675593] mount.zfs D ffff88016fa13f40 0 20919 1 0x00000004
Aug 9 17:25:53 pmb-LinuxBookAir kernel: [23843.675595] ffff88012ab95758 0000000000000082 ffff880166685d00 ffff88012ab95fd8
Aug 9 17:25:53 pmb-LinuxBookAir kernel: [23843.675598] ffff88012ab95fd8 ffff88012ab95fd8 ffff880076ad8000 ffff880166685d00
Aug 9 17:25:53 pmb-LinuxBookAir kernel: [23843.675600] ffff880166685d00 ffff88016fa147f8 ffff8800028977e0 0000000000000001
Aug 9 17:25:53 pmb-LinuxBookAir kernel: [23843.675603] Call Trace:
Aug 9 17:25:53 pmb-LinuxBookAir kernel: [23843.675605] [] schedule+0x29/0x70
Aug 9 17:25:53 pmb-LinuxBookAir kernel: [23843.675608] [] io_schedule+0x8f/0xd0
Aug 9 17:25:53 pmb-LinuxBookAir kernel: [23843.675615] [] cv_wait_common+0xa8/0x1b0 [spl]
Aug 9 17:25:53 pmb-LinuxBookAir kernel: [23843.675617] [] ? finish_wait+0x80/0x80
Aug 9 17:25:53 pmb-LinuxBookAir kernel: [23843.675623] [] __cv_wait_io+0x18/0x20 [spl]
Aug 9 17:25:53 pmb-LinuxBookAir kernel: [23843.675643] [] zio_wait+0x103/0x1a0 [zfs]
Aug 9 17:25:53 pmb-LinuxBookAir kernel: [23843.675656] [] dbuf_read+0x314/0x7e0 [zfs]
Aug 9 17:25:53 pmb-LinuxBookAir kernel: [23843.675670] [] __dbuf_hold_impl+0x3cc/0x450 [zfs]
Aug 9 17:25:53 pmb-LinuxBookAir kernel: [23843.675682] [] __dbuf_hold_impl+0x16c/0x450 [zfs]
Aug 9 17:25:53 pmb-LinuxBookAir kernel: [23843.675694] [] __dbuf_hold_impl+0x16c/0x450 [zfs]
Aug 9 17:25:53 pmb-LinuxBookAir kernel: [23843.675707] [] __dbuf_hold_impl+0x16c/0x450 [zfs]
Aug 9 17:25:53 pmb-LinuxBookAir kernel: [23843.675719] [] __dbuf_hold_impl+0x16c/0x450 [zfs]
Aug 9 17:25:53 pmb-LinuxBookAir kernel: [23843.675732] [] __dbuf_hold_impl+0x16c/0x450 [zfs]
Aug 9 17:25:53 pmb-LinuxBookAir kernel: [23843.675745] [] dbuf_hold_impl+0x8d/0xc0 [zfs]
Aug 9 17:25:53 pmb-LinuxBookAir kernel: [23843.675757] [] dbuf_hold+0x20/0x40 [zfs]
Aug 9 17:25:53 pmb-LinuxBookAir kernel: [23843.675773] [] dnode_hold_impl+0x2d0/0x5f0 [zfs]
Aug 9 17:25:53 pmb-LinuxBookAir kernel: [23843.675778] [] ? __kmalloc+0x13d/0x170
Aug 9 17:25:53 pmb-LinuxBookAir kernel: [23843.675785] [] ? kmem_alloc_debug+0x96/0x3b0 [spl]
Aug 9 17:25:53 pmb-LinuxBookAir kernel: [23843.675800] [] dnode_hold+0x19/0x20 [zfs]
Aug 9 17:25:53 pmb-LinuxBookAir kernel: [23843.675814] [] dmu_buf_hold+0x4a/0x1b0 [zfs]
Aug 9 17:25:53 pmb-LinuxBookAir kernel: [23843.675816] [] ? mutex_lock+0x1d/0x50
Aug 9 17:25:53 pmb-LinuxBookAir kernel: [23843.675837] [] zap_lockdir+0x5a/0x760 [zfs]
Aug 9 17:25:53 pmb-LinuxBookAir kernel: [23843.675856] [] ? zrl_init+0x36/0x40 [zfs]
Aug 9 17:25:53 pmb-LinuxBookAir kernel: [23843.675859] [] ? mutex_lock+0x1d/0x50
Aug 9 17:25:53 pmb-LinuxBookAir kernel: [23843.675879] [] zap_lookup_norm+0x4a/0x190 [zfs]
Aug 9 17:25:53 pmb-LinuxBookAir kernel: [23843.675898] [] zap_lookup+0x33/0x40 [zfs]
Aug 9 17:25:53 pmb-LinuxBookAir kernel: [23843.675917] [] zfs_get_zplprop+0x57/0xc0 [zfs]
Aug 9 17:25:53 pmb-LinuxBookAir kernel: [23843.675936] [] zfs_sb_create+0xa6/0x560 [zfs]
Aug 9 17:25:53 pmb-LinuxBookAir kernel: [23843.675938] [] ? down_write+0x12/0x40
Aug 9 17:25:53 pmb-LinuxBookAir kernel: [23843.675957] [] zfs_domount+0x2f/0x270 [zfs]
Aug 9 17:25:53 pmb-LinuxBookAir kernel: [23843.675960] [] ? get_anon_bdev+0x110/0x110
Aug 9 17:25:53 pmb-LinuxBookAir kernel: [23843.675978] [] ? zpl_mount+0x30/0x30 [zfs]
Aug 9 17:25:53 pmb-LinuxBookAir kernel: [23843.675996] [] zpl_fill_super+0xe/0x20 [zfs]
Aug 9 17:25:53 pmb-LinuxBookAir kernel: [23843.675998] [] mount_nodev+0x56/0xb0
Aug 9 17:25:53 pmb-LinuxBookAir kernel: [23843.676016] [] zpl_mount+0x25/0x30 [zfs]
Aug 9 17:25:53 pmb-LinuxBookAir kernel: [23843.676021] [] mount_fs+0x43/0x1b0
Aug 9 17:25:53 pmb-LinuxBookAir kernel: [23843.676025] [] vfs_kern_mount+0x74/0x110
Aug 9 17:25:53 pmb-LinuxBookAir kernel: [23843.676028] [] do_mount+0x21f/0xac0
Aug 9 17:25:53 pmb-LinuxBookAir kernel: [23843.676032] [] ? __get_free_pages+0xe/0x50
Aug 9 17:25:53 pmb-LinuxBookAir kernel: [23843.676035] [] ? copy_mount_options+0x36/0x170
Aug 9 17:25:53 pmb-LinuxBookAir kernel: [23843.676037] [] sys_mount+0x8e/0xe0
Aug 9 17:25:53 pmb-LinuxBookAir kernel: [23843.676039] [] system_call_fastpath+0x1a/0x1f
The text was updated successfully, but these errors were encountered: