Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deadlock in zpool, txg_sync and udev/vol_id when importing the pool #1862

Closed
FransUrbo opened this issue Nov 13, 2013 · 80 comments
Closed

Deadlock in zpool, txg_sync and udev/vol_id when importing the pool #1862

FransUrbo opened this issue Nov 13, 2013 · 80 comments
Milestone

Comments

@FransUrbo
Copy link
Contributor

This is a fork of the #1848 issue.

When importing the pool without the 'zvol_inhibit_dev' parameter, I get (after a few minutes) an oops from first zpool, txg_sync and then vol_id (from udev, NOT the zvol_id) and all vol_id processes are tagged Dead (177 of them, although I only have 64 VDEVs). And the import is hung and I can no longer issue any zfs command...

[ 1935.855833] SPL: Loaded module v0.6.2-1.20131110_crypto
[ 1935.856315] zunicode: module license 'CDDL' taints kernel.
[ 1935.856318] Disabling lock debugging due to kernel taint
[ 1935.883882] ZFS: Loaded module v0.6.2-1.20131113_crypto, ZFS pool version 5000, ZFS filesystem version 5
[ 1972.592902] SPL: using hostid 0xa8c00845
[ 1974.925020]  zd0: unknown partition table
[ 1975.171192]  zd16: p1 p2 p3
[ 1975.380597]  zd32: p1 p2 p3
[ 1975.425234]  zd48: unknown partition table
[ 1975.469974]  zd64: unknown partition table
[ 1975.536150]  zd80: unknown partition table
[ 1975.811834]  zd96: p1
[ 1975.811834]  p1: <solaris: [s0] p5 [s2] p6 [s8] p7 >
[ 1976.009411]  zd112: unknown partition table
[ 1976.304591]  zd128: p1
[ 1976.619496]  zd144: p1
[ 1976.619496]  p1: <solaris: [s0] p5 [s2] p6 [s8] p7 >
[ 1976.909568]  zd160: p1 p2 p3
[ 1977.038737]  zd176: unknown partition table
[ 1977.345266]  zd192: p1 p2 < p5 >
[ 1977.624824]  zd208: p1 p2 < p5 >
[ 1977.828811]  zd224: p1
[ 1978.041224]  zd240: p1
[ 1978.248323]  zd256: p1
[ 1978.364182]  zd272: unknown partition table
[ 1978.618128]  zd288: p1
[ 1978.854242]  zd304: p1
[ 1979.071706]  zd320: p1 p2 p3
[ 1979.257299]  zd336: p1 p2 p3
[ 1979.440590]  zd352: p1
[ 1979.589259]  zd368: p1
[ 1979.809191]  zd384: p1 p2 p3
[ 1980.091709]  zd400: p1 p2 < p5 >
[ 1980.396174]  zd416: p1 p2 < p5 >
[ 1980.708611]  zd432: p1 p2 p3
[ 1980.896684]  zd448: p1 p2 < p5 >
[ 1981.201037]  zd464: p1 p2 < p5 >
[ 1981.381464]  zd480: p1
[ 1981.584330]  zd496: unknown partition table
[ 1981.837717]  zd512: p1
[ 1982.139536]  zd528: p1 p2 < p5 >
[ 1982.340551]  zd544: p1
[ 1982.432025]  zd560: p1
[ 1982.658010]  zd576: p1 p2
[ 1982.750527]  zd592: p1
[ 1982.835009]  zd608: p1 p2
[ 1983.062772]  zd624: p1
[ 1983.379149]  zd640: p1 p2 < p5 >
[ 1983.671411]  zd656: p1 p2 < p5 >
[ 1983.953109]  zd672: p1 p2 p3
[ 1984.134006]  zd688: p1 p2 p3
[ 1984.338892]  zd704: unknown partition table
[ 1984.525932]  zd720: p1
[ 1984.741452]  zd736: p1 p2 p3
[ 1985.062658]  zd752: p1 p2 < p5 >
[ 1985.275333]  zd768: p1 p2 < p5 >
[ 1985.404177]  zd784: unknown partition table
[ 1985.550725]  zd800: unknown partition table
[ 1985.634266]  zd816: unknown partition table
[ 1985.937100]  zd832: p1 p2 p3
[ 1986.033612]  zd848: unknown partition table
[ 1986.322119]  zd864: p1 p2 < p5 >
[ 1986.510430]  zd880: p1 p2
[ 1986.833831]  zd896: p1
[ 1987.015856]  zd912: p1
[ 1987.232678]  zd928: p1 p2
[ 1987.451998]  zd944: unknown partition table
[ 1987.563576]  zd960: unknown partition table
[ 1987.816374]  zd976: p1 p2 p3
[ 1988.165110]  zd992: p1 p2 p3 p4
[ 1988.165110]  p1: <bsd: >
[ 1988.535175]  zd1008: p1 p2 p3 p4
[ 1988.535175]  p1: <bsd: >
[ 2159.714988] INFO: task zpool:23852 blocked for more than 120 seconds.
[ 2159.715077]       Tainted: P           O 3.12.0+scst+tf.1 #5
[ 2159.715157] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2159.715269] zpool           D ffff88041fcd4ac0     0 23852      1 0x20020000
[ 2159.715273]  ffff88040e029000 0000000000200086 0000000000000007 0000000000000008
[ 2159.715277]  ffff8803ef1ca010 0000000000014ac0 0000000000014ac0 ffff88040e029000
[ 2159.715279]  0000000000014ac0 ffff8803ef1cbfd8 ffff8803ef1cbfd8 0000000000014ac0
[ 2159.715282] Call Trace:
[ 2159.715288]  [<ffffffff8111ee39>] ? __kmalloc+0x138/0x1b5
[ 2159.715294]  [<ffffffffa0767034>] ? nvs_native_nvpair+0x1f/0x7a [znvpair]
[ 2159.715298]  [<ffffffffa0766a1b>] ? nvs_operation+0x7e/0x280 [znvpair]
[ 2159.715302]  [<ffffffffa0767009>] ? nvs_native_nvp_op+0x1e0/0x1ec [znvpair]
[ 2159.715305]  [<ffffffff811fdad7>] ? clear_page_c+0x7/0x10
[ 2159.715309]  [<ffffffff813c41ab>] ? schedule_preempt_disabled+0x9/0xc
[ 2159.715311]  [<ffffffff813c1fd0>] ? __mutex_lock_slowpath+0x1ab/0x1f6
[ 2159.715314]  [<ffffffff813c4d93>] ? _raw_spin_unlock+0x9/0xb
[ 2159.715316]  [<ffffffff813c1d72>] ? mutex_lock+0x12/0x22
[ 2159.715337]  [<ffffffffa07d404f>] ? spa_open_common+0x51/0x2f4 [zfs]
[ 2159.715339]  [<ffffffff8111ee39>] ? __kmalloc+0x138/0x1b5
[ 2159.715344]  [<ffffffffa06e65b6>] ? kmem_alloc_debug+0x11d/0x2c9 [spl]
[ 2159.715362]  [<ffffffffa07d5d3a>] ? spa_get_stats+0x35/0x228 [zfs]
[ 2159.715365]  [<ffffffff810ef59f>] ? release_pages+0x197/0x1a6
[ 2159.715380]  [<ffffffffa07ffe4d>] ? zfs_ioc_pool_stats+0x21/0x57 [zfs]
[ 2159.715395]  [<ffffffffa0802efa>] ? zfsdev_ioctl+0x335/0x3c0 [zfs]
[ 2159.715398]  [<ffffffff8110e8a7>] ? page_add_new_anon_rmap+0x68/0x12a
[ 2159.715413]  [<ffffffff813c4d93>] ? _raw_spin_unlock+0x9/0xb
[ 2159.715416]  [<ffffffff81104fd7>] ? handle_mm_fault+0x9c3/0x9f8
[ 2159.715418]  [<ffffffff8110fe8f>] ? unlink_anon_vmas+0xee/0x17b
[ 2159.715421]  [<ffffffff81045eed>] ? do_page_fault+0x3ed/0x439
[ 2159.715424]  [<ffffffff81169163>] ? compat_sys_ioctl+0x1bf/0xf30
[ 2159.715426]  [<ffffffff8110a038>] ? remove_vma+0x44/0x4b
[ 2159.715428]  [<ffffffff8110b590>] ? do_munmap+0x2dd/0x2f9
[ 2159.715430]  [<ffffffff813c7909>] ? ia32_do_call+0x13/0x13
[ 2159.715443] INFO: task zpool:24089 blocked for more than 120 seconds.
[ 2159.715525]       Tainted: P           O 3.12.0+scst+tf.1 #5
[ 2159.715605] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2159.715716] zpool           D ffff88041fd14ac0     0 24089  24068 0x20020000
[ 2159.715718]  ffff88040da28000 0000000000200086 ffff88041fd14ac0 ffff88041d8f6010
[ 2159.715721]  ffff8803e7c60010 0000000000014ac0 0000000000014ac0 ffff88040da28000
[ 2159.715723]  0000000000014ac0 ffff8803e7c61fd8 ffff8803e7c61fd8 0000000000014ac0
[ 2159.715725] Call Trace:
[ 2159.715728]  [<ffffffffa0767034>] ? nvs_native_nvpair+0x1f/0x7a [znvpair]
[ 2159.715731]  [<ffffffffa07665fc>] ? nvs_native_nvlist+0x53/0x7d [znvpair]
[ 2159.715734]  [<ffffffffa0766a1b>] ? nvs_operation+0x7e/0x280 [znvpair]
[ 2159.715738]  [<ffffffffa06e4bc3>] ? kmem_free_debug+0xc5/0x10d [spl]
[ 2159.715740]  [<ffffffff813c4df4>] ? _raw_spin_lock_irqsave+0x33/0x38
[ 2159.715744]  [<ffffffffa06ee2d2>] ? cv_wait_common+0xd4/0x164 [spl]
[ 2159.715759]  [<ffffffffa07dbcc3>] ? spa_lookup+0x4b/0x50 [zfs]
[ 2159.715762]  [<ffffffff8106acd0>] ? wake_up_bit+0x22/0x22
[ 2159.715776]  [<ffffffffa07ca570>] ? rrw_enter_read+0x40/0x130 [zfs]
[ 2159.715791]  [<ffffffffa07bd572>] ? dsl_pool_hold+0x43/0x4c [zfs]
[ 2159.715792]  [<ffffffff813c1d72>] ? mutex_lock+0x12/0x22
[ 2159.715805]  [<ffffffffa07a49c9>] ? dmu_objset_hold+0x1d/0x78 [zfs]
[ 2159.715809]  [<ffffffffa06e65b6>] ? kmem_alloc_debug+0x11d/0x2c9 [spl]
[ 2159.715822]  [<ffffffffa07ff64d>] ? zfs_ioc_dataset_list_next+0x48/0x140 [zfs]
[ 2159.715835]  [<ffffffffa0802efa>] ? zfsdev_ioctl+0x335/0x3c0 [zfs]
[ 2159.715838]  [<ffffffff810e4b89>] ? find_get_page+0x18/0x6c
[ 2159.715840]  [<ffffffff810e63d2>] ? generic_file_aio_read+0x557/0x5f6
[ 2159.715842]  [<ffffffff81169163>] ? compat_sys_ioctl+0x1bf/0xf30
[ 2159.715845]  [<ffffffff8112967e>] ? vfs_read+0xa8/0x14c
[ 2159.715847]  [<ffffffff813c7909>] ? ia32_do_call+0x13/0x13
[ 2159.715849] INFO: task txg_sync:24095 blocked for more than 120 seconds.
[ 2159.715932]       Tainted: P           O 3.12.0+scst+tf.1 #5
[ 2159.716012] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2159.716123] txg_sync        D ffff88041fcd4ac0     0 24095      2 0x00000000
[ 2159.716125]  ffff8803ed34f000 0000000000000046 ffff8803e6aae010 0000000000014ac0
[ 2159.716127]  ffff8803e6aae010 0000000000014ac0 0000000000014ac0 ffff8803ed34f000
[ 2159.716129]  0000000000014ac0 ffff8803e6aaffd8 ffff8803e6aaffd8 0000000000014ac0
[ 2159.716132] Call Trace:
[ 2159.716134]  [<ffffffff810185f9>] ? sched_clock+0x13/0x16
[ 2159.716136]  [<ffffffff810767b1>] ? update_rq_clock+0x1e/0xec
[ 2159.716139]  [<ffffffff810711fa>] ? check_preempt_curr+0x36/0x62
[ 2159.716141]  [<ffffffff81071302>] ? ttwu_do_wakeup+0xf/0xab
[ 2159.716142]  [<ffffffff8101865f>] ? read_tsc+0x18/0x2e
[ 2159.716144]  [<ffffffff81077cfd>] ? try_to_wake_up+0x22d/0x23f
[ 2159.716147]  [<ffffffff813c4df4>] ? _raw_spin_lock_irqsave+0x33/0x38
[ 2159.716150]  [<ffffffffa06ee2d2>] ? cv_wait_common+0xd4/0x164 [spl]
[ 2159.716153]  [<ffffffffa06ee30e>] ? cv_wait_common+0x110/0x164 [spl]
[ 2159.716155]  [<ffffffff8106acd0>] ? wake_up_bit+0x22/0x22
[ 2159.716170]  [<ffffffffa07ca4ed>] ? rrw_enter_write+0x29/0x6c [zfs]
[ 2159.716185]  [<ffffffffa07c4905>] ? dsl_sync_task_sync+0x7e/0xca [zfs]
[ 2159.716199]  [<ffffffffa07be2de>] ? dsl_pool_sync+0x37c/0x4d9 [zfs]
[ 2159.716201]  [<ffffffff8106acd0>] ? wake_up_bit+0x22/0x22
[ 2159.716216]  [<ffffffffa07ce7a0>] ? spa_sync+0x506/0x941 [zfs]
[ 2159.716218]  [<ffffffff8106acd9>] ? autoremove_wake_function+0x9/0x2e
[ 2159.716221]  [<ffffffff8107370c>] ? __wake_up+0x30/0x44
[ 2159.716235]  [<ffffffffa07df0b6>] ? txg_sync_thread+0x334/0x54b [zfs]
[ 2159.716239]  [<ffffffffa06e4bc3>] ? kmem_free_debug+0xc5/0x10d [spl]
[ 2159.716253]  [<ffffffffa07ded82>] ? txg_thread_exit+0x2b/0x2b [zfs]
[ 2159.716257]  [<ffffffffa06e76dc>] ? __thread_create+0x2e4/0x2e4 [spl]
[ 2159.716260]  [<ffffffffa06e7746>] ? thread_generic_wrapper+0x6a/0x73 [spl]
[ 2159.716262]  [<ffffffff8106a781>] ? kthread+0xb0/0xba
[ 2159.716264]  [<ffffffff8106a6d1>] ? __init_kthread_worker+0x2d/0x2d
[ 2159.716266]  [<ffffffff813c590c>] ? ret_from_fork+0x7c/0xb0
[ 2159.716268]  [<ffffffff8106a6d1>] ? __init_kthread_worker+0x2d/0x2d
[ 2159.716270] INFO: task vol_id:24103 blocked for more than 120 seconds.
[ 2159.716352]       Tainted: P           O 3.12.0+scst+tf.1 #5
[ 2159.716432] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2159.716543] vol_id          D ffff88041fc94ac0     0 24103      1 0x20020000
[ 2159.716545]  ffff8803ef277000 0000000000000086 ffffffff8107c64f ffffffff810eb3ea
[ 2159.716547]  ffff8803e66be010 0000000000014ac0 0000000000014ac0 ffff8803ef277000
[ 2159.716549]  0000000000014ac0 ffff8803e66bffd8 ffff8803e66bffd8 0000000000014ac0
[ 2159.716551] Call Trace:
[ 2159.716554]  [<ffffffff8107c64f>] ? dequeue_task_fair+0x4a3/0xa4d
[ 2159.716556]  [<ffffffff810eb3ea>] ? __alloc_pages_nodemask+0x190/0x910
[ 2159.716558]  [<ffffffff813c3d63>] ? __schedule+0x754/0x7ea
[ 2159.716560]  [<ffffffff8110e8a7>] ? page_add_new_anon_rmap+0x68/0x12a
[ 2159.716562]  [<ffffffff810e4d2b>] ? unlock_page+0xd/0x1b
[ 2159.716564]  [<ffffffff813c4df4>] ? _raw_spin_lock_irqsave+0x33/0x38
[ 2159.716567]  [<ffffffffa06ee2d2>] ? cv_wait_common+0xd4/0x164 [spl]
[ 2159.716582]  [<ffffffffa07dbcc3>] ? spa_lookup+0x4b/0x50 [zfs]
[ 2159.716584]  [<ffffffff8106acd0>] ? wake_up_bit+0x22/0x22
[ 2159.716599]  [<ffffffffa07ca570>] ? rrw_enter_read+0x40/0x130 [zfs]
[ 2159.716613]  [<ffffffffa07bd572>] ? dsl_pool_hold+0x43/0x4c [zfs]
[ 2159.716625]  [<ffffffffa07a48dd>] ? dmu_objset_own+0x2e/0xfd [zfs]
[ 2159.716638]  [<ffffffffa0826d67>] ? zvol_open+0xaf/0x276 [zfs]
[ 2159.716641]  [<ffffffff81155819>] ? __blkdev_get+0x10a/0x3df
[ 2159.716643]  [<ffffffff8113e249>] ? inode_init_always+0xff/0x1b5
[ 2159.716645]  [<ffffffff81155cd2>] ? blkdev_get+0x1e4/0x2f1
[ 2159.716647]  [<ffffffff8113d2c4>] ? unlock_new_inode+0x4e/0x59
[ 2159.716649]  [<ffffffff813c4d93>] ? _raw_spin_unlock+0x9/0xb
[ 2159.716651]  [<ffffffff81154ef8>] ? bdget+0x122/0x12a
[ 2159.716653]  [<ffffffff813c4d93>] ? _raw_spin_unlock+0x9/0xb
[ 2159.716655]  [<ffffffff81155ddf>] ? blkdev_get+0x2f1/0x2f1
[ 2159.716657]  [<ffffffff81127082>] ? do_dentry_open+0x16c/0x22b
[ 2159.716659]  [<ffffffff8112720f>] ? finish_open+0x2c/0x35
[ 2159.716661]  [<ffffffff81134d9d>] ? do_last+0x9fd/0xc4a
[ 2159.716663]  [<ffffffff81135249>] ? path_openat+0x25f/0x5bd
[ 2159.716665]  [<ffffffff81141561>] ? mntput_no_expire+0x1b/0x16c
[ 2159.716667]  [<ffffffff811356a2>] ? do_filp_open+0x2d/0x75
[ 2159.716669]  [<ffffffff8111e32c>] ? kmem_cache_alloc+0x114/0x194
[ 2159.716671]  [<ffffffff813c4d93>] ? _raw_spin_unlock+0x9/0xb
[ 2159.716673]  [<ffffffff8113fa09>] ? __alloc_fd+0xfa/0x10c
[ 2159.716675]  [<ffffffff81126cfa>] ? do_sys_open+0x146/0x1d6
[ 2159.716677]  [<ffffffff813c7909>] ? ia32_do_call+0x13/0x13
[ 2159.716679] INFO: task vol_id:24112 blocked for more than 120 seconds.
[ 2159.716761]       Tainted: P           O 3.12.0+scst+tf.1 #5
[ 2159.716840] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2159.716952] vol_id          D ffff88041fd54ac0     0 24112      1 0x20020000
[ 2159.716953]  ffff8803e65c0000 0000000000000086 0000000000000029 ffff88040affe080
[ 2159.716955]  ffff8803e7f9e010 0000000000014ac0 0000000000014ac0 ffff8803e65c0000
[ 2159.716958]  0000000000014ac0 ffff8803e7f9ffd8 ffff8803e7f9ffd8 0000000000014ac0
[ 2159.716960] Call Trace:
[ 2159.716962]  [<ffffffff8110e8a7>] ? page_add_new_anon_rmap+0x68/0x12a
[ 2159.716964]  [<ffffffff810e4d2b>] ? unlock_page+0xd/0x1b
[ 2159.716966]  [<ffffffff8110328c>] ? __do_fault+0x3ed/0x40a
[ 2159.716968]  [<ffffffff81104bc0>] ? handle_mm_fault+0x5ac/0x9f8
[ 2159.716970]  [<ffffffff813c41ab>] ? schedule_preempt_disabled+0x9/0xc
[ 2159.716972]  [<ffffffff813c1fd0>] ? __mutex_lock_slowpath+0x1ab/0x1f6
[ 2159.716975]  [<ffffffff811e454b>] ? get_disk+0x36/0x52
[ 2159.716977]  [<ffffffff813c1d72>] ? mutex_lock+0x12/0x22
[ 2159.716989]  [<ffffffffa0826d05>] ? zvol_open+0x4d/0x276 [zfs]
[ 2159.716991]  [<ffffffff81155819>] ? __blkdev_get+0x10a/0x3df
[ 2159.716993]  [<ffffffff8113e249>] ? inode_init_always+0xff/0x1b5
[ 2159.716995]  [<ffffffff81155cd2>] ? blkdev_get+0x1e4/0x2f1
[ 2159.716997]  [<ffffffff8113d2c4>] ? unlock_new_inode+0x4e/0x59
[ 2159.716999]  [<ffffffff813c4d93>] ? _raw_spin_unlock+0x9/0xb
[ 2159.717001]  [<ffffffff81154ef8>] ? bdget+0x122/0x12a
[ 2159.717003]  [<ffffffff813c4d93>] ? _raw_spin_unlock+0x9/0xb
[ 2159.717005]  [<ffffffff81155ddf>] ? blkdev_get+0x2f1/0x2f1
[ 2159.717006]  [<ffffffff81127082>] ? do_dentry_open+0x16c/0x22b
[ 2159.717008]  [<ffffffff8112720f>] ? finish_open+0x2c/0x35
[ 2159.717010]  [<ffffffff81134d9d>] ? do_last+0x9fd/0xc4a
[ 2159.717012]  [<ffffffff81135249>] ? path_openat+0x25f/0x5bd
[ 2159.717014]  [<ffffffff81141561>] ? mntput_no_expire+0x1b/0x16c
[ 2159.717016]  [<ffffffff811356a2>] ? do_filp_open+0x2d/0x75
[ 2159.717018]  [<ffffffff8111e32c>] ? kmem_cache_alloc+0x114/0x194
[ 2159.717020]  [<ffffffff813c4d93>] ? _raw_spin_unlock+0x9/0xb
[ 2159.717022]  [<ffffffff8113fa09>] ? __alloc_fd+0xfa/0x10c
[ 2159.717023]  [<ffffffff81126cfa>] ? do_sys_open+0x146/0x1d6
[ 2159.717025]  [<ffffffff813c7909>] ? ia32_do_call+0x13/0x13
[ 2159.717027] INFO: task vol_id:24120 blocked for more than 120 seconds.
[ 2159.717109]       Tainted: P           O 3.12.0+scst+tf.1 #5
[ 2159.717189] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2159.717300] vol_id          D ffff88041fd54ac0     0 24120      1 0x20020000
[ 2159.717302]  ffff8803e6753000 0000000000000086 ffff8803e6776000 000088040ab18780
[ 2159.717304]  ffff8803e6776010 0000000000014ac0 0000000000014ac0 ffff8803e6753000
[ 2159.717306]  0000000000014ac0 ffff8803e6777fd8 ffff8803e6777fd8 0000000000014ac0
[ 2159.717308] Call Trace:
[ 2159.717310]  [<ffffffff810eb3ea>] ? __alloc_pages_nodemask+0x190/0x910
[ 2159.717312]  [<ffffffff81104bc0>] ? handle_mm_fault+0x5ac/0x9f8
[ 2159.717315]  [<ffffffff813c41ab>] ? schedule_preempt_disabled+0x9/0xc
[ 2159.717317]  [<ffffffff813c1fd0>] ? __mutex_lock_slowpath+0x1ab/0x1f6
[ 2159.717319]  [<ffffffff811e454b>] ? get_disk+0x36/0x52
[ 2159.717320]  [<ffffffff813c1d72>] ? mutex_lock+0x12/0x22
[ 2159.717332]  [<ffffffffa0826d05>] ? zvol_open+0x4d/0x276 [zfs]
[ 2159.717335]  [<ffffffff81155819>] ? __blkdev_get+0x10a/0x3df
[ 2159.717336]  [<ffffffff8113e249>] ? inode_init_always+0xff/0x1b5
[ 2159.717339]  [<ffffffff81155cd2>] ? blkdev_get+0x1e4/0x2f1
[ 2159.717340]  [<ffffffff8113d2c4>] ? unlock_new_inode+0x4e/0x59
[ 2159.717342]  [<ffffffff813c4d93>] ? _raw_spin_unlock+0x9/0xb
[ 2159.717344]  [<ffffffff81154ef8>] ? bdget+0x122/0x12a
[ 2159.717346]  [<ffffffff813c4d93>] ? _raw_spin_unlock+0x9/0xb
[ 2159.717348]  [<ffffffff81155ddf>] ? blkdev_get+0x2f1/0x2f1
[ 2159.717350]  [<ffffffff81127082>] ? do_dentry_open+0x16c/0x22b
[ 2159.717352]  [<ffffffff8112720f>] ? finish_open+0x2c/0x35
[ 2159.717354]  [<ffffffff81134d9d>] ? do_last+0x9fd/0xc4a
[ 2159.717356]  [<ffffffff81135249>] ? path_openat+0x25f/0x5bd
[ 2159.717358]  [<ffffffff81141561>] ? mntput_no_expire+0x1b/0x16c
[ 2159.717360]  [<ffffffff811356a2>] ? do_filp_open+0x2d/0x75
[ 2159.717361]  [<ffffffff8111e32c>] ? kmem_cache_alloc+0x114/0x194
[ 2159.717363]  [<ffffffff813c4d93>] ? _raw_spin_unlock+0x9/0xb
[ 2159.717365]  [<ffffffff8113fa09>] ? __alloc_fd+0xfa/0x10c
[ 2159.717367]  [<ffffffff81126cfa>] ? do_sys_open+0x146/0x1d6
[ 2159.717369]  [<ffffffff813c7909>] ? ia32_do_call+0x13/0x13
[ 2159.717371] INFO: task vol_id:24128 blocked for more than 120 seconds.
[ 2159.717453]       Tainted: P           O 3.12.0+scst+tf.1 #5
[ 2159.717533] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2159.717644] vol_id          D ffff88041fc14ac0     0 24128      1 0x20020000
[ 2159.717645]  ffff880409ec6000 0000000000000086 ffff8803e6756000 000088040ccb17c0
[ 2159.717648]  ffff8803e6756010 0000000000014ac0 0000000000014ac0 ffff880409ec6000
[ 2159.717650]  0000000000014ac0 ffff8803e6757fd8 ffff8803e6757fd8 0000000000014ac0
[ 2159.717652] Call Trace:
[ 2159.717654]  [<ffffffff810eb3ea>] ? __alloc_pages_nodemask+0x190/0x910
[ 2159.717656]  [<ffffffff81104fd7>] ? handle_mm_fault+0x9c3/0x9f8
[ 2159.717658]  [<ffffffff813c41ab>] ? schedule_preempt_disabled+0x9/0xc
[ 2159.717660]  [<ffffffff813c1fd0>] ? __mutex_lock_slowpath+0x1ab/0x1f6
[ 2159.717662]  [<ffffffff811e454b>] ? get_disk+0x36/0x52
[ 2159.717664]  [<ffffffff813c1d72>] ? mutex_lock+0x12/0x22
[ 2159.717676]  [<ffffffffa0826d05>] ? zvol_open+0x4d/0x276 [zfs]
[ 2159.717678]  [<ffffffff81155819>] ? __blkdev_get+0x10a/0x3df
[ 2159.717680]  [<ffffffff8113e249>] ? inode_init_always+0xff/0x1b5
[ 2159.717682]  [<ffffffff81155cd2>] ? blkdev_get+0x1e4/0x2f1
[ 2159.717684]  [<ffffffff8113d2c4>] ? unlock_new_inode+0x4e/0x59
[ 2159.717686]  [<ffffffff813c4d93>] ? _raw_spin_unlock+0x9/0xb
[ 2159.717688]  [<ffffffff81154ef8>] ? bdget+0x122/0x12a
[ 2159.717690]  [<ffffffff813c4d93>] ? _raw_spin_unlock+0x9/0xb
[ 2159.717692]  [<ffffffff81155ddf>] ? blkdev_get+0x2f1/0x2f1
[ 2159.717694]  [<ffffffff81127082>] ? do_dentry_open+0x16c/0x22b
[ 2159.717695]  [<ffffffff8112720f>] ? finish_open+0x2c/0x35
[ 2159.717697]  [<ffffffff81134d9d>] ? do_last+0x9fd/0xc4a
[ 2159.717699]  [<ffffffff81135249>] ? path_openat+0x25f/0x5bd
[ 2159.717701]  [<ffffffff81141561>] ? mntput_no_expire+0x1b/0x16c
[ 2159.717703]  [<ffffffff811356a2>] ? do_filp_open+0x2d/0x75
[ 2159.717705]  [<ffffffff8111e32c>] ? kmem_cache_alloc+0x114/0x194
[ 2159.717707]  [<ffffffff813c4d93>] ? _raw_spin_unlock+0x9/0xb
[ 2159.717709]  [<ffffffff8113fa09>] ? __alloc_fd+0xfa/0x10c
[ 2159.717711]  [<ffffffff81126cfa>] ? do_sys_open+0x146/0x1d6
[ 2159.717713]  [<ffffffff813c7909>] ? ia32_do_call+0x13/0x13
[ 2159.717714] INFO: task vol_id:24136 blocked for more than 120 seconds.
[ 2159.717797]       Tainted: P           O 3.12.0+scst+tf.1 #5
[ 2159.717876] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2159.717987] vol_id          D ffff88041fc14ac0     0 24136      1 0x20020000
[ 2159.717989]  ffff8803e676f000 0000000000000086 0000000000000029 ffff88040bd15e80
[ 2159.717991]  ffff8803e605c010 0000000000014ac0 0000000000014ac0 ffff8803e676f000
[ 2159.717993]  0000000000014ac0 ffff8803e605dfd8 ffff8803e605dfd8 0000000000014ac0
[ 2159.717995] Call Trace:
[ 2159.717997]  [<ffffffff8110e8a7>] ? page_add_new_anon_rmap+0x68/0x12a
[ 2159.717999]  [<ffffffff810e4d2b>] ? unlock_page+0xd/0x1b
[ 2159.718001]  [<ffffffff8110328c>] ? __do_fault+0x3ed/0x40a
[ 2159.718003]  [<ffffffff81104bc0>] ? handle_mm_fault+0x5ac/0x9f8
[ 2159.718006]  [<ffffffff813c41ab>] ? schedule_preempt_disabled+0x9/0xc
[ 2159.718008]  [<ffffffff813c1fd0>] ? __mutex_lock_slowpath+0x1ab/0x1f6
[ 2159.718010]  [<ffffffff811e454b>] ? get_disk+0x36/0x52
[ 2159.718012]  [<ffffffff813c1d72>] ? mutex_lock+0x12/0x22
[ 2159.718024]  [<ffffffffa0826d05>] ? zvol_open+0x4d/0x276 [zfs]
[ 2159.718026]  [<ffffffff81155819>] ? __blkdev_get+0x10a/0x3df
[ 2159.718028]  [<ffffffff8113e249>] ? inode_init_always+0xff/0x1b5
[ 2159.718030]  [<ffffffff81155cd2>] ? blkdev_get+0x1e4/0x2f1
[ 2159.718031]  [<ffffffff8113d2c4>] ? unlock_new_inode+0x4e/0x59
[ 2159.718033]  [<ffffffff813c4d93>] ? _raw_spin_unlock+0x9/0xb
[ 2159.718035]  [<ffffffff81154ef8>] ? bdget+0x122/0x12a
[ 2159.718037]  [<ffffffff813c4d93>] ? _raw_spin_unlock+0x9/0xb
[ 2159.718039]  [<ffffffff81155ddf>] ? blkdev_get+0x2f1/0x2f1
[ 2159.718041]  [<ffffffff81127082>] ? do_dentry_open+0x16c/0x22b
[ 2159.718043]  [<ffffffff8112720f>] ? finish_open+0x2c/0x35
[ 2159.718045]  [<ffffffff81134d9d>] ? do_last+0x9fd/0xc4a
[ 2159.718047]  [<ffffffff81135249>] ? path_openat+0x25f/0x5bd
[ 2159.718049]  [<ffffffff81141561>] ? mntput_no_expire+0x1b/0x16c
[ 2159.718050]  [<ffffffff811356a2>] ? do_filp_open+0x2d/0x75
[ 2159.718052]  [<ffffffff8111e32c>] ? kmem_cache_alloc+0x114/0x194
[ 2159.718054]  [<ffffffff813c4d93>] ? _raw_spin_unlock+0x9/0xb
[ 2159.718056]  [<ffffffff8113fa09>] ? __alloc_fd+0xfa/0x10c
[ 2159.718058]  [<ffffffff81126cfa>] ? do_sys_open+0x146/0x1d6
[ 2159.718060]  [<ffffffff813c7909>] ? ia32_do_call+0x13/0x13
[ 2159.718062] INFO: task vol_id:24144 blocked for more than 120 seconds.
[ 2159.718144]       Tainted: P           O 3.12.0+scst+tf.1 #5
[ 2159.718223] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2159.718334] vol_id          D ffff88041fcd4ac0     0 24144      1 0x20020000
[ 2159.718336]  ffff8803e605f000 0000000000000086 0000000000000029 ffff88040cd0aac0
[ 2159.718338]  ffff8803e661e010 0000000000014ac0 0000000000014ac0 ffff8803e605f000
[ 2159.718340]  0000000000014ac0 ffff8803e661ffd8 ffff8803e661ffd8 0000000000014ac0
[ 2159.718343] Call Trace:
[ 2159.718345]  [<ffffffff8110e8a7>] ? page_add_new_anon_rmap+0x68/0x12a
[ 2159.718347]  [<ffffffff810e4d2b>] ? unlock_page+0xd/0x1b
[ 2159.718349]  [<ffffffff8110328c>] ? __do_fault+0x3ed/0x40a
[ 2159.718351]  [<ffffffff81104bc0>] ? handle_mm_fault+0x5ac/0x9f8
[ 2159.718353]  [<ffffffff813c41ab>] ? schedule_preempt_disabled+0x9/0xc
[ 2159.718355]  [<ffffffff813c1fd0>] ? __mutex_lock_slowpath+0x1ab/0x1f6
[ 2159.718357]  [<ffffffff811e454b>] ? get_disk+0x36/0x52
[ 2159.718359]  [<ffffffff813c1d72>] ? mutex_lock+0x12/0x22
[ 2159.718371]  [<ffffffffa0826d05>] ? zvol_open+0x4d/0x276 [zfs]
[ 2159.718373]  [<ffffffff81155819>] ? __blkdev_get+0x10a/0x3df
[ 2159.718375]  [<ffffffff8113e249>] ? inode_init_always+0xff/0x1b5
[ 2159.718377]  [<ffffffff81155cd2>] ? blkdev_get+0x1e4/0x2f1
[ 2159.718379]  [<ffffffff8113d2c4>] ? unlock_new_inode+0x4e/0x59
[ 2159.718381]  [<ffffffff813c4d93>] ? _raw_spin_unlock+0x9/0xb
[ 2159.718383]  [<ffffffff81154ef8>] ? bdget+0x122/0x12a
[ 2159.718385]  [<ffffffff813c4d93>] ? _raw_spin_unlock+0x9/0xb
[ 2159.718387]  [<ffffffff81155ddf>] ? blkdev_get+0x2f1/0x2f1
[ 2159.718388]  [<ffffffff81127082>] ? do_dentry_open+0x16c/0x22b
[ 2159.718390]  [<ffffffff8112720f>] ? finish_open+0x2c/0x35
[ 2159.718392]  [<ffffffff81134d9d>] ? do_last+0x9fd/0xc4a
[ 2159.718394]  [<ffffffff81135249>] ? path_openat+0x25f/0x5bd
[ 2159.718396]  [<ffffffff81141561>] ? mntput_no_expire+0x1b/0x16c
[ 2159.718398]  [<ffffffff811356a2>] ? do_filp_open+0x2d/0x75
[ 2159.718400]  [<ffffffff8111e32c>] ? kmem_cache_alloc+0x114/0x194
[ 2159.718402]  [<ffffffff813c4d93>] ? _raw_spin_unlock+0x9/0xb
[ 2159.718404]  [<ffffffff8113fa09>] ? __alloc_fd+0xfa/0x10c
[ 2159.718405]  [<ffffffff81126cfa>] ? do_sys_open+0x146/0x1d6
[ 2159.718407]  [<ffffffff813c7909>] ? ia32_do_call+0x13/0x13
[ 2159.718409] INFO: task vol_id:24152 blocked for more than 120 seconds.
[ 2159.718492]       Tainted: P           O 3.12.0+scst+tf.1 #5
[ 2159.718572] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2159.718683] vol_id          D ffff88041fc14ac0     0 24152      1 0x20020000
[ 2159.718685]  ffff8803e6750000 0000000000000086 0000000000000029 ffff88040ccb16c0
[ 2159.718687]  ffff8803e61c8010 0000000000014ac0 0000000000014ac0 ffff8803e6750000
[ 2159.718689]  0000000000014ac0 ffff8803e61c9fd8 ffff8803e61c9fd8 0000000000014ac0
[ 2159.718691] Call Trace:
[ 2159.718693]  [<ffffffff8110e8a7>] ? page_add_new_anon_rmap+0x68/0x12a
[ 2159.718695]  [<ffffffff810e4d2b>] ? unlock_page+0xd/0x1b
[ 2159.718697]  [<ffffffff8110328c>] ? __do_fault+0x3ed/0x40a
[ 2159.718699]  [<ffffffff81104bc0>] ? handle_mm_fault+0x5ac/0x9f8
[ 2159.718701]  [<ffffffff813c41ab>] ? schedule_preempt_disabled+0x9/0xc
[ 2159.718703]  [<ffffffff813c1fd0>] ? __mutex_lock_slowpath+0x1ab/0x1f6
[ 2159.718705]  [<ffffffff811e454b>] ? get_disk+0x36/0x52
[ 2159.718707]  [<ffffffff813c1d72>] ? mutex_lock+0x12/0x22
[ 2159.718719]  [<ffffffffa0826d05>] ? zvol_open+0x4d/0x276 [zfs]
[ 2159.718721]  [<ffffffff81155819>] ? __blkdev_get+0x10a/0x3df
[ 2159.718723]  [<ffffffff8113e249>] ? inode_init_always+0xff/0x1b5
[ 2159.718725]  [<ffffffff81155cd2>] ? blkdev_get+0x1e4/0x2f1
[ 2159.718727]  [<ffffffff8113d2c4>] ? unlock_new_inode+0x4e/0x59
[ 2159.718729]  [<ffffffff813c4d93>] ? _raw_spin_unlock+0x9/0xb
[ 2159.718731]  [<ffffffff81154ef8>] ? bdget+0x122/0x12a
[ 2159.718733]  [<ffffffff813c4d93>] ? _raw_spin_unlock+0x9/0xb
[ 2159.718735]  [<ffffffff81155ddf>] ? blkdev_get+0x2f1/0x2f1
[ 2159.718737]  [<ffffffff81127082>] ? do_dentry_open+0x16c/0x22b
[ 2159.718738]  [<ffffffff8112720f>] ? finish_open+0x2c/0x35
[ 2159.718740]  [<ffffffff81134d9d>] ? do_last+0x9fd/0xc4a
[ 2159.718742]  [<ffffffff81135249>] ? path_openat+0x25f/0x5bd
[ 2159.718744]  [<ffffffff81141561>] ? mntput_no_expire+0x1b/0x16c
[ 2159.718746]  [<ffffffff811356a2>] ? do_filp_open+0x2d/0x75
[ 2159.718748]  [<ffffffff8111e32c>] ? kmem_cache_alloc+0x114/0x194
[ 2159.718750]  [<ffffffff813c4d93>] ? _raw_spin_unlock+0x9/0xb
[ 2159.718752]  [<ffffffff8113fa09>] ? __alloc_fd+0xfa/0x10c
[ 2159.718754]  [<ffffffff81126cfa>] ? do_sys_open+0x146/0x1d6
[ 2159.718756]  [<ffffffff813c7909>] ? ia32_do_call+0x13/0x13

Yes, this is zfs-crypto, but it is now completely in line with ZoL/master so I'm a little unsure if it have something to with the crypto code or ZoL...

Since I only have one filesystem that is currently encrypted, I'm prepared to 'downgrade' (i.e. remove the crypto feature) to do some testing with ZoL/master if needed.

@behlendorf
Copy link
Contributor

This looks like some sort of deadlock between pool import and zvol device creation. The fact that suppressing the zvol creation with zvol_inhibit_dev would support that. Also just to be clear this isn't an OOPS, it's just a deadlock which triggered the slow task watchdog. Still it's absolutely not good.

@FransUrbo
Copy link
Contributor Author

@behlendorf Any idea how I should proceed [with finding out what/where the deadlock might be]?

Setting zvol_inhibit_dev to 0 and creating a new filesystem works fine. The ZVOL device is where it's supposed to be in /dev. However, accessing it doesn't seems possible.

I ran fdisk on it after creating the ZVOL, and that hung. As well as 'txg_sync'. I got that not-an-oops again. And all writes to the/a filesystem froze...

@FransUrbo
Copy link
Contributor Author

Doing some more debugging, after cherry-picking some stuff that looks fine, I have:

 4287 ?        S<s    0:01 udevd --daemon
10882 ?        S<     0:00  \_ udevd --daemon
10883 ?        D<     0:01  |   \_ /sbin/zpool list

Doing an strace on these three process I see:

[pid  4287] stat64("/dev/.udev/failed", {st_mode=S_IFDIR|0755, st_size=2420, ...}) = 0
[pid  4287] rename("/dev/.udev/queue/4319", "/dev/.udev/failed/\\x2fblock\\x2fzd1024") = 0
[pid  4287] rmdir("/dev/.udev/queue")   = -1 ENOTEMPTY (Directory not empty)
[pid  4287] waitpid(-1, 0xffd91a70, WNOHANG) = 0
[pid  4287] select(7, [3 4 5 6], NULL, NULL, NULL

It then hangs on the select.

Looking in /udev/.udev/:

celia:~# ll /dev/.udev/
total 4
drwxr-xr-x    2 root root 18160 Nov 18 08:39 db/
drwxr-xr-x    2 root root  4700 Nov 18 08:46 failed/
drwxr-xr-x 1944 root root 38880 Nov 18 08:39 names/
drwxr-xr-x    2 root root    60 Nov 18 08:37 rules.d/
-rw-r--r--    1 root root     5 Nov 18 08:40 uevent_seqnum
celia:~# ll /dev/.udev/failed/ | head
total 0
lrwxrwxrwx 1 root root 10 Nov 18 08:39 \x2fblock\x2fzd0 -> /block/zd0
lrwxrwxrwx 1 root root 13 Nov 18 08:40 \x2fblock\x2fzd1008 -> /block/zd1008
lrwxrwxrwx 1 root root 22 Nov 18 08:40 \x2fblock\x2fzd1008\x2fzd1008p1 -> /block/zd1008/zd1008p1
lrwxrwxrwx 1 root root 22 Nov 18 08:40 \x2fblock\x2fzd1008\x2fzd1008p2 -> /block/zd1008/zd1008p2
lrwxrwxrwx 1 root root 22 Nov 18 08:40 \x2fblock\x2fzd1008\x2fzd1008p3 -> /block/zd1008/zd1008p3
lrwxrwxrwx 1 root root 22 Nov 18 08:40 \x2fblock\x2fzd1008\x2fzd1008p4 -> /block/zd1008/zd1008p4
lrwxrwxrwx 1 root root 13 Nov 18 08:40 \x2fblock\x2fzd1024 -> /block/zd1024
lrwxrwxrwx 1 root root 12 Nov 18 08:39 \x2fblock\x2fzd112 -> /block/zd112
lrwxrwxrwx 1 root root 12 Nov 18 08:39 \x2fblock\x2fzd128 -> /block/zd128
celia:~# ll /dev/.udev/failed/|tail
lrwxrwxrwx 1 root root 32 Nov 18 08:37 \x2fdevices\x2fpci0000:00\x2f0000:00:18.1 -> /devices/pci0000:00/0000:00:18.1
lrwxrwxrwx 1 root root 32 Nov 18 08:37 \x2fdevices\x2fpci0000:00\x2f0000:00:18.2 -> /devices/pci0000:00/0000:00:18.2
lrwxrwxrwx 1 root root 32 Nov 18 08:37 \x2fdevices\x2fpci0000:00\x2f0000:00:18.4 -> /devices/pci0000:00/0000:00:18.4
lrwxrwxrwx 1 root root 34 Nov 18 08:37 \x2fdevices\x2fplatform\x2fSP5100 TCO timer -> /devices/platform/SP5100 TCO timer
lrwxrwxrwx 1 root root 40 Nov 18 08:37 \x2fdevices\x2fplatform\x2fplatform-framebuffer.0 -> /devices/platform/platform-framebuffer.0
lrwxrwxrwx 1 root root 25 Nov 18 08:39 \x2fdevices\x2fvirtual\x2fmisc\x2fzfs -> /devices/virtual/misc/zfs
lrwxrwxrwx 1 root root 57 Nov 18 08:37 \x2fdevices\x2fvirtual\x2fwmi\x2f05901221-D566-11D1-B2F0-00A0C9062910 -> /devices/virtual/wmi/05901221-D566-11D1-B2F0-00A0C9062910
lrwxrwxrwx 1 root root 57 Nov 18 08:37 \x2fdevices\x2fvirtual\x2fwmi\x2f466747A0-70EC-11DE-8A39-0800200C9A66 -> /devices/virtual/wmi/466747A0-70EC-11DE-8A39-0800200C9A66
lrwxrwxrwx 1 root root 57 Nov 18 08:37 \x2fdevices\x2fvirtual\x2fwmi\x2f97845ED0-4E6D-11DE-8A39-0800200C9A66 -> /devices/virtual/wmi/97845ED0-4E6D-11DE-8A39-0800200C9A66
lrwxrwxrwx 1 root root 57 Nov 18 08:37 \x2fdevices\x2fvirtual\x2fwmi\x2fABBC0F72-8EA1-11D1-00A0-C90629100000 -> /devices/virtual/wmi/ABBC0F72-8EA1-11D1-00A0-C90629100000
celia:~# ll /dev/.udev/db/| head
total 3624
-rw-r--r-- 1 root root  30 Nov 18 08:37 \x2fblock\x2floop0
-rw-r--r-- 1 root root  30 Nov 18 08:37 \x2fblock\x2floop1
-rw-r--r-- 1 root root  33 Nov 18 08:37 \x2fblock\x2floop10
-rw-r--r-- 1 root root  36 Nov 18 08:37 \x2fblock\x2floop100
-rw-r--r-- 1 root root  36 Nov 18 08:37 \x2fblock\x2floop101
-rw-r--r-- 1 root root  36 Nov 18 08:37 \x2fblock\x2floop102
-rw-r--r-- 1 root root  36 Nov 18 08:37 \x2fblock\x2floop103
-rw-r--r-- 1 root root  36 Nov 18 08:37 \x2fblock\x2floop104
-rw-r--r-- 1 root root  36 Nov 18 08:37 \x2fblock\x2floop105
celia:~# ll /dev/.udev/names/| head
total 0
drwxr-xr-x 2 root root 60 Nov 18 08:37 block\x2f1:0/
drwxr-xr-x 2 root root 60 Nov 18 08:37 block\x2f1:1/
drwxr-xr-x 2 root root 60 Nov 18 08:37 block\x2f1:10/
drwxr-xr-x 2 root root 60 Nov 18 08:37 block\x2f1:11/
drwxr-xr-x 2 root root 60 Nov 18 08:37 block\x2f1:12/
drwxr-xr-x 2 root root 60 Nov 18 08:37 block\x2f1:13/
drwxr-xr-x 2 root root 60 Nov 18 08:37 block\x2f1:14/
drwxr-xr-x 2 root root 60 Nov 18 08:37 block\x2f1:15/
drwxr-xr-x 2 root root 60 Nov 18 08:37 block\x2f1:2/
celia:~# ll /dev/.udev/names/block\\x2f1\:0/
total 0
-rw-r--r-- 1 root root 0 Nov 18 08:37 \x2fblock\x2fram0

Since the process list indicates that it's the 'zpool list' that hangs, running this under strace shows:

[....]
access("/usr/bin/net", X_OK)            = 0
access("/proc/net/iet/volume", F_OK)    = -1 ENOENT (No such file or directory)
stat64("/sys/kernel/scst_tgt", 0xfff16794) = -1 ENOENT (No such file or directory)
access("/sys/module/zfs", F_OK)         = 0
open("/dev/zfs", O_RDWR|O_LARGEFILE)    = 3
open("/etc/mtab", O_RDONLY|O_LARGEFILE) = 4
open("/etc/dfs/sharetab", O_RDONLY|O_LARGEFILE) = 5
open("/dev/zfs", O_RDWR|O_LARGEFILE)    = 6
fstat64(5, {st_mode=S_IFREG|0600, st_size=0, ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xf772b000
_llseek(5, 0, [0], SEEK_SET)            = 0
read(5, ""..., 4096)                    = 0
ioctl(3, ITE_GPIO_OUT_STATUS

and then it hangs at the ioctl...

@FransUrbo
Copy link
Contributor Author

After a restart (so I could try again):

 4313 ?        S<s    0:01 udevd --daemon
11816 ?        S<     0:00  \_ udevd --daemon
11819 ?        D<     0:00  |   \_ /lib/udev/vol_id --export /dev/.tmp-230-17

and running that vol_id under strace, I see:

set_thread_area({entry_number:-1 -> 12, base_addr:0xf75afad0, limit:1048575, seg_32bit:1, contents:0, read_exec_only:0, limit_in_pages:1, seg_not_present:0, useable:1}) = 0
mprotect(0xf76e8000, 4096, PROT_READ)   = 0
munmap(0xf76f6000, 89847)               = 0
open("/dev/.tmp-230-17", O_RDONLY|O_LARGEFILE

So maybe it's LARGEFILE support that's the problem?!

celia:/dev# df -h .
Filesystem            Size  Used Avail Use% Mounted on
udev                   10M  3.6M  6.5M  36% /dev

@FransUrbo
Copy link
Contributor Author

Well, I couldn't find anything that support that, but what's (also) weird:

celia:~# ll /dev/.udev/db/*zd*
-rw-r--r-- 1 root root 29 Nov 18 10:47 /dev/.udev/db/\x2fdevices\x2fvirtual\x2ftty\x2fptyzd
-rw-r--r-- 1 root root 29 Nov 18 10:47 /dev/.udev/db/\x2fdevices\x2fvirtual\x2ftty\x2fttyzd

Shouldn't the zd* devices be there? I CAN see that udev IS reading/loading the ZoL rules...

@FransUrbo
Copy link
Contributor Author

Might be something wrong with udev:

celia:~# udevinfo --export-db | grep zd
P: /devices/virtual/tty/ptyzd
N: ptyzd
P: /devices/virtual/tty/ttyzd
N: ttyzd
celia:~# 

@FransUrbo
Copy link
Contributor Author

Running zvol_id under strace:

stat64("/dev/.tmp-230-1008", {st_mode=S_IFBLK|0600, st_rdev=makedev(230, 1008), ...}) = 0
open("/dev/.tmp-230-1008", O_RDONLY|O_LARGEFILE

@FransUrbo
Copy link
Contributor Author

A full log of a strace of the original (mother) udev, a child and the zpool list processes can be found at http://bayour.com/misc/strace_zpool_import-full.txt.

I'm currently looking at it myself, but it's 18M and I can't see something obvious :(

I also put the strace log from a manual zpool list at http://bayour.com/misc/strace_zpool_import.txt just for completeness and there I can see that the ioctl() hangs on /dev/zfs.

@FransUrbo
Copy link
Contributor Author

Any access to a ZVOL, even with 'zvol_inhibit_dev=1' hangs txg_sync. Creating one works, but any access to it hangs...

@FransUrbo
Copy link
Contributor Author

@behlendorf do you have any input? Any idea how to proceed?

@FransUrbo
Copy link
Contributor Author

Loading the module without zvol_inhibit_dev then restart udevd gives:

celia:~# /etc/init.d/udev start
Starting the hotplug events dispatcher: udevd.
Synthesizing the initial hotplug events...done.
Waiting for /dev to be fully populated...

After the udevadm settle timeout, the events queue contains:

5065: /devices/virtual/misc/zfs
5699: /devices/virtual/block/zd0
5701: /devices/virtual/block/zd16
5702: /devices/virtual/block/zd16/zd16p1
5703: /devices/virtual/block/zd16/zd16p2
5704: /devices/virtual/block/zd16/zd16p3
[....]
5934: /devices/virtual/block/zd1008
5935: /devices/virtual/block/zd1008/zd1008p1
5936: /devices/virtual/block/zd1008/zd1008p2
5937: /devices/virtual/block/zd1008/zd1008p3
5938: /devices/virtual/block/zd1008/zd1008p4


done (timeout).

Then trying to run 'zpool list' hangs zpool, txg_sync and vol_id as usual.

@FransUrbo
Copy link
Contributor Author

Disabling the ZoL udev rules completely does not change anything. Still get the hang...

@FransUrbo
Copy link
Contributor Author

The syslog after module loaded (includes output from sysrq 't' key) - http://bayour.com/misc/syslog_zpool_import.txt.

@FransUrbo
Copy link
Contributor Author

Now I can't even mount all filesystems! It fails after a while.

[  191.386413] ZFS: Loaded module v0.6.2-1.20131123_crypto_iscsi_mine_cherry, ZFS pool version 5000, ZFS filesystem version 5
[  393.786148] SPL: using hostid 0xa8c00845
[  721.064113] INFO: task mount.zfs:11173 blocked for more than 120 seconds.
[  721.064202]       Tainted: P           O 3.12.0+scst+tf.1 #5
[  721.064282] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  721.064395] mount.zfs       D ffff88041fc54ac0     0 11173  11172 0x20020000
[  721.064399]  ffff8803f0f3e000 0000000000200086 ffff8803f08d7bb8 00000000000009fc
[  721.064402]  ffff8803f08d6010 0000000000014ac0 0000000000014ac0 ffff8803f0f3e000
[  721.064405]  0000000000014ac0 ffff8803f08d7fd8 ffff8803f08d7fd8 0000000000014ac0
[  721.064407] Call Trace:
[  721.064415]  [<ffffffff81110ddf>] ? __insert_vmap_area+0x75/0xc1
[  721.064420]  [<ffffffff813c4df4>] ? _raw_spin_lock_irqsave+0x33/0x38
[  721.064427]  [<ffffffffa07bf2d2>] ? cv_wait_common+0xd4/0x164 [spl]
[  721.064431]  [<ffffffff8106acd0>] ? wake_up_bit+0x22/0x22
[  721.064435]  [<ffffffffa07b72fc>] ? vmem_alloc_debug+0x104/0x2a1 [spl]
[  721.064456]  [<ffffffffa08afa33>] ? txg_wait_synced+0x134/0x156 [zfs]
[  721.064470]  [<ffffffffa08f56a3>] ? zpl_mount+0x21/0x21 [zfs]
[  721.064484]  [<ffffffffa08ebad9>] ? zil_replay+0x79/0xe8 [zfs]
[  721.064508]  [<ffffffffa08dc3ba>] ? zfs_sb_setup+0xe6/0x109 [zfs]
[  721.064521]  [<ffffffffa08dc9c0>] ? zfs_domount+0x247/0x2c3 [zfs]
[  721.064523]  [<ffffffff8112a52b>] ? get_anon_bdev+0xd9/0xd9
[  721.064535]  [<ffffffffa08f56ac>] ? zpl_fill_super+0x9/0xd [zfs]
[  721.064537]  [<ffffffff8112bb28>] ? mount_nodev+0x46/0x7f
[  721.064549]  [<ffffffffa08f569e>] ? zpl_mount+0x1c/0x21 [zfs]
[  721.064551]  [<ffffffff8112b968>] ? mount_fs+0x6a/0x14b
[  721.064554]  [<ffffffff81142cee>] ? vfs_kern_mount+0x60/0xde
[  721.064556]  [<ffffffff81143577>] ? do_mount+0x7e2/0x953
[  721.064558]  [<ffffffff8116836f>] ? compat_sys_mount+0x1b4/0x1f0
[  721.064561]  [<ffffffff813c7909>] ? ia32_do_call+0x13/0x13
[  840.951724] INFO: task mount.zfs:11173 blocked for more than 120 seconds.
[  840.951811]       Tainted: P           O 3.12.0+scst+tf.1 #5
[  840.951892] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  840.952003] mount.zfs       D ffff88041fc54ac0     0 11173  11172 0x20020000
[  840.952007]  ffff8803f0f3e000 0000000000200086 ffff8803f08d7bb8 00000000000009fc
[  840.952010]  ffff8803f08d6010 0000000000014ac0 0000000000014ac0 ffff8803f0f3e000
[  840.952013]  0000000000014ac0 ffff8803f08d7fd8 ffff8803f08d7fd8 0000000000014ac0
[  840.952015] Call Trace:
[  840.952021]  [<ffffffff81110ddf>] ? __insert_vmap_area+0x75/0xc1
[  840.952034]  [<ffffffff813c4df4>] ? _raw_spin_lock_irqsave+0x33/0x38
[  840.952039]  [<ffffffffa07bf2d2>] ? cv_wait_common+0xd4/0x164 [spl]
[  840.952042]  [<ffffffff8106acd0>] ? wake_up_bit+0x22/0x22
[  840.952046]  [<ffffffffa07b72fc>] ? vmem_alloc_debug+0x104/0x2a1 [spl]
[  840.952064]  [<ffffffffa08afa33>] ? txg_wait_synced+0x134/0x156 [zfs]
[  840.952077]  [<ffffffffa08f56a3>] ? zpl_mount+0x21/0x21 [zfs]
[  840.952090]  [<ffffffffa08ebad9>] ? zil_replay+0x79/0xe8 [zfs]
[  840.952103]  [<ffffffffa08dc3ba>] ? zfs_sb_setup+0xe6/0x109 [zfs]
[  840.952116]  [<ffffffffa08dc9c0>] ? zfs_domount+0x247/0x2c3 [zfs]
[  840.952118]  [<ffffffff8112a52b>] ? get_anon_bdev+0xd9/0xd9
[  840.952131]  [<ffffffffa08f56ac>] ? zpl_fill_super+0x9/0xd [zfs]
[  840.952133]  [<ffffffff8112bb28>] ? mount_nodev+0x46/0x7f
[  840.952145]  [<ffffffffa08f569e>] ? zpl_mount+0x1c/0x21 [zfs]
[  840.952147]  [<ffffffff8112b968>] ? mount_fs+0x6a/0x14b
[  840.952149]  [<ffffffff81142cee>] ? vfs_kern_mount+0x60/0xde
[  840.952151]  [<ffffffff81143577>] ? do_mount+0x7e2/0x953
[  840.952153]  [<ffffffff8116836f>] ? compat_sys_mount+0x1b4/0x1f0
[  840.952156]  [<ffffffff813c7909>] ? ia32_do_call+0x13/0x13

Output from a sysrq 't' key: http://bayour.com/misc/syslog_zfs_mount_all.txt.

Since I see a 'txg_wait_synced' in there, I checked the 'zfs_txg_timeout' value, and it's at the default '5'. I'll reboot and increase that to ... 10?

@FransUrbo
Copy link
Contributor Author

Trying to remove the cache device (after a successful import zfs_txg_timeout=10):

Nov 27 12:54:29 Celia kernel: [  481.292664] INFO: task l2arc_feed:10711 blocked for more than 120 seconds.
Nov 27 12:54:29 Celia kernel: [  481.292753]       Tainted: P           O 3.12.0+scst+tf.1 #5
Nov 27 12:54:29 Celia kernel: [  481.292834] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov 27 12:54:29 Celia kernel: [  481.292947] l2arc_feed      D ffff88041fd14ac0     0 10711      2 0x00000000
Nov 27 12:54:29 Celia kernel: [  481.292950]  ffff8800dc95d000 0000000000000046 ffff880037ef9fd8 ffff880037ef9fd8
Nov 27 12:54:29 Celia kernel: [  481.292954]  ffff880037ef8010 0000000000014ac0 0000000000014ac0 ffff8800dc95d000
Nov 27 12:54:29 Celia kernel: [  481.292957]  0000000000014ac0 ffff880037ef9fd8 ffff880037ef9fd8 0000000000014ac0
Nov 27 12:54:29 Celia kernel: [  481.292959] Call Trace:
Nov 27 12:54:29 Celia kernel: [  481.292967]  [<ffffffff813c4df4>] ? _raw_spin_lock_irqsave+0x33/0x38
Nov 27 12:54:29 Celia kernel: [  481.292972]  [<ffffffff81059d71>] ? lock_timer_base+0x26/0x4b
Nov 27 12:54:29 Celia kernel: [  481.292974]  [<ffffffff81059de0>] ? try_to_del_timer_sync+0x4a/0x54
Nov 27 12:54:29 Celia kernel: [  481.292976]  [<ffffffff81059e24>] ? del_timer_sync+0x3a/0x44
Nov 27 12:54:29 Celia kernel: [  481.292979]  [<ffffffff813c1a61>] ? schedule_timeout+0x191/0x1af
Nov 27 12:54:29 Celia kernel: [  481.292982]  [<ffffffff813c41ab>] ? schedule_preempt_disabled+0x9/0xc
Nov 27 12:54:29 Celia kernel: [  481.292984]  [<ffffffff813c1fd0>] ? __mutex_lock_slowpath+0x1ab/0x1f6
Nov 27 12:54:29 Celia kernel: [  481.292987]  [<ffffffff8106acd0>] ? wake_up_bit+0x22/0x22
Nov 27 12:54:29 Celia kernel: [  481.292989]  [<ffffffff813c1d72>] ? mutex_lock+0x12/0x22
Nov 27 12:54:29 Celia kernel: [  481.293007]  [<ffffffffa076d589>] ? l2arc_feed_thread+0xa3/0x998 [zfs]
Nov 27 12:54:29 Celia kernel: [  481.293020]  [<ffffffffa076d4e6>] ? arc_freed+0x91/0x91 [zfs]
Nov 27 12:54:29 Celia kernel: [  481.293025]  [<ffffffffa06c36dc>] ? __thread_create+0x2e4/0x2e4 [spl]
Nov 27 12:54:29 Celia kernel: [  481.293029]  [<ffffffffa06c3746>] ? thread_generic_wrapper+0x6a/0x73 [spl]
Nov 27 12:54:29 Celia kernel: [  481.293031]  [<ffffffff8106a781>] ? kthread+0xb0/0xba
Nov 27 12:54:29 Celia kernel: [  481.293034]  [<ffffffff8106a6d1>] ? __init_kthread_worker+0x2d/0x2d
Nov 27 12:54:29 Celia kernel: [  481.293036]  [<ffffffff813c590c>] ? ret_from_fork+0x7c/0xb0
Nov 27 12:54:29 Celia kernel: [  481.293038]  [<ffffffff8106a6d1>] ? __init_kthread_worker+0x2d/0x2d
Nov 27 12:54:29 Celia kernel: [  481.293056] INFO: task zpool:11009 blocked for more than 120 seconds.
Nov 27 12:54:29 Celia kernel: [  481.293138]       Tainted: P           O 3.12.0+scst+tf.1 #5
Nov 27 12:54:29 Celia kernel: [  481.293218] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov 27 12:54:29 Celia kernel: [  481.293329] zpool           D ffff88041fd54ac0     0 11009  10625 0x20020000
Nov 27 12:54:29 Celia kernel: [  481.293331]  ffff8803f60c9000 0000000000200086 0000000000200282 0000000000000018
Nov 27 12:54:29 Celia kernel: [  481.293333]  ffff88040b21e010 0000000000014ac0 0000000000014ac0 ffff8803f60c9000
Nov 27 12:54:29 Celia kernel: [  481.293336]  0000000000014ac0 ffff88040b21ffd8 ffff88040b21ffd8 0000000000014ac0
Nov 27 12:54:29 Celia kernel: [  481.293338] Call Trace:
Nov 27 12:54:29 Celia kernel: [  481.293342]  [<ffffffffa0743034>] ? nvs_native_nvpair+0x1f/0x7a [znvpair]
Nov 27 12:54:29 Celia kernel: [  481.293353]  [<ffffffffa076ce65>] ? arc_buf_remove_ref+0xde/0xea [zfs]
Nov 27 12:54:29 Celia kernel: [  481.293356]  [<ffffffffa0742a1b>] ? nvs_operation+0x7e/0x280 [znvpair]
Nov 27 12:54:29 Celia kernel: [  481.293359]  [<ffffffffa0743009>] ? nvs_native_nvp_op+0x1e0/0x1ec [znvpair]
Nov 27 12:54:29 Celia kernel: [  481.293362]  [<ffffffffa0743034>] ? nvs_native_nvpair+0x1f/0x7a [znvpair]
Nov 27 12:54:29 Celia kernel: [  481.293365]  [<ffffffff810e4b89>] ? find_get_page+0x18/0x6c
Nov 27 12:54:29 Celia kernel: [  481.293367]  [<ffffffff813c4df4>] ? _raw_spin_lock_irqsave+0x33/0x38
Nov 27 12:54:29 Celia kernel: [  481.293371]  [<ffffffffa06ca2d2>] ? cv_wait_common+0xd4/0x164 [spl]
Nov 27 12:54:29 Celia kernel: [  481.293373]  [<ffffffff810e4d2b>] ? unlock_page+0xd/0x1b
Nov 27 12:54:29 Celia kernel: [  481.293374]  [<ffffffff8106acd0>] ? wake_up_bit+0x22/0x22
Nov 27 12:54:29 Celia kernel: [  481.293391]  [<ffffffffa07b7d47>] ? spa_config_enter+0x9b/0xf0 [zfs]
Nov 27 12:54:29 Celia kernel: [  481.293405]  [<ffffffffa07b811f>] ? spa_vdev_config_enter+0x16/0x22 [zfs]
Nov 27 12:54:29 Celia kernel: [  481.293420]  [<ffffffffa07ac8a4>] ? spa_vdev_remove+0x40/0x330 [zfs]
Nov 27 12:54:29 Celia kernel: [  481.293424]  [<ffffffffa06c25da>] ? kmem_alloc_debug+0x141/0x2c9 [spl]
Nov 27 12:54:29 Celia kernel: [  481.293437]  [<ffffffffa07dc651>] ? zfs_ioc_vdev_remove+0x33/0x4f [zfs]
Nov 27 12:54:29 Celia kernel: [  481.293451]  [<ffffffffa07dee62>] ? zfsdev_ioctl+0x326/0x3ba [zfs]
Nov 27 12:54:29 Celia kernel: [  481.293454]  [<ffffffff81169163>] ? compat_sys_ioctl+0x1bf/0xf30
Nov 27 12:54:29 Celia kernel: [  481.293457]  [<ffffffff8110a038>] ? remove_vma+0x44/0x4b
Nov 27 12:54:29 Celia kernel: [  481.293459]  [<ffffffff8110b590>] ? do_munmap+0x2dd/0x2f9
Nov 27 12:54:29 Celia kernel: [  481.293461]  [<ffffffff813c7909>] ? ia32_do_call+0x13/0x13

@FransUrbo
Copy link
Contributor Author

Have #938 resurfaced again? My txg_sync process is also at 100% after a call trace like #1862 (comment) which hangs 'zfs mount -av' at 12/628.

During the import:

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
10716 root      20   0  8864 2320 1676 R  101  0.0  27:56.87 zpool
10937 root       0 -20     0    0    0 R  101  0.0  27:57.50 txg_sync

101%!? Cool! :)

Trying options wildly.

celia:~# grep ^options /etc/modprobe.d/zfs
options zfs zfs_arc_max=4294967296
options zfs zfs_arc_min=536870912
options zfs zvol_threads=64

@FransUrbo
Copy link
Contributor Author

A zpool import (no mount) is still running after almost an hour. Increasing the arc_max to 10GB don't seem to change anything (as yet). No call trace or hung process though, that's good. I'm going to let it run and see what happens...

Load average at 2.10/2.20/2.11, an strace on the zpool process doesn't give anything (and yet it runs at 100% CPU) and I don't know how to trace a kernel process (txg_sync, also 100%)...

@FransUrbo
Copy link
Contributor Author

After running zpool import for 80 minutes, with a huge max arch cache, memory is barely touched

celia:~# free
             total       used       free     shared    buffers     cached
Mem:      16407640     665372   15742268          0      83192     176252
-/+ buffers/cache:     405928   16001712
Swap:      8393956          0    8393956

@FransUrbo
Copy link
Contributor Author

A 'zdb share' gives

celia:~# zdb share
zdb: can't open 'share': Input/output error

An strace output of this can be found at http://bayour.com/misc/strace_zdb.txt.

@dweeezil
Copy link
Contributor

@FransUrbo Not much information there. Try doing an strace -o <file> -ff. You'll find that it generates a ton of subprocesses.

@FransUrbo
Copy link
Contributor Author

@dweeezil what process do you want me to do strace? I think I've done it on every one I could think of...

@FransUrbo
Copy link
Contributor Author

celia:~# strace -o /tmp/x -ff zdb share
zdb: can't open 'share': Input/output error
upeek: ptrace(PTRACE_PEEKUSER,4587,44,0): No such process

And there's no /tmp/x file...

Trying again:

celia:~# strace -o /tmp/x -ff zdb share
zdb: can't open 'share': Input/output error
celia:~# ll /tmp/x
/bin/ls: cannot access /tmp/x: No such file or directory

@dweeezil
Copy link
Contributor

@FransUrbo The zdb command forks/clones itself a lot and most of the interesting action that might help figure out your Input/output error is in one of the forked processes. The processes are all zdb. When you use "-o /tmp/x -ff", the output files will be named "/tmp/xx." and there should be a whole lot of them. Did you look in /tmp to see if there were any files there?

# strace -o /tmp/x -ff zdb junk1 > /dev/null
# ls /tmp/x.*
/tmp/x.21772  /tmp/x.21797  /tmp/x.21822  /tmp/x.21847 ...

I'd suspect that one of those trace files should give a clue as to where you I/O error is occurring.

@FransUrbo
Copy link
Contributor Author

Ah! I've never used double F's, single yes, but not double. Much simpler than getting it all in one file :)

Uploaded to http://bayour.com/misc/zdb_share/.

@dweeezil
Copy link
Contributor

@FransUrbo The strace output doesn't show anything terribly interesting because a lot of the work is being done in the kernel. The first thing I'd do is to rule out a corrupted cache file. I'd also suggest backing off to a simple zdb -d share which won't try to do so much.

In order to check your cache file, I'd suggest renaming it with mv /etc/zfs/zpool.cache /etc/zfs/zpool.cache.save. When running zdb with no cache file, you need to add the "-e" option and I also recommend an appropriate "-p" option which from the looks of your strace output would end up as zdb -e -p /dev/disk/by-id .... However, once the cache file is renamed, it would probably be worth backing off to testing with zpool import. Either try a plain zpool import or a zpool import -d /dev/disk/by-id and see whether it can find the pool. Once that works, you can start trying various zdb commands as indicated above.

Your strace output shows the following devices being opened:

/dev/disk/by-id/ata-ST3000DM001-9YN166_Z1F0X41F-part1
/dev/disk/by-id/scsi-SATA_ST31500341AS_9VS4XK4T-part1
/dev/disk/by-id/ata-ST3000DM001-9YN166_Z1F0X3P4-part1
/dev/disk/by-id/scsi-SATA_ST1500DL003-9VT_5YD1F2KF-part1
/dev/disk/by-id/scsi-SATA_ST31500341AS_9VS4VT5X-part1
/dev/disk/by-id/scsi-SATA_ST31500341AS_9VS3SAWS-part1
/dev/disk/by-id/scsi-SATA_ST31500341AS_9VS4WM30-part1
/dev/disk/by-id/scsi-SATA_ST31500341AS_9VS0H3A9-part1
/dev/disk/by-id/scsi-SATA_ST31500341AS_9VS4WWPA-part1
/dev/disk/by-id/scsi-SATA_ST31500341AS_9VS4Q38C-part1
/dev/disk/by-id/scsi-SATA_ST31500341AS_9VS4VT5R-part1
/dev/disk/by-id/scsi-SATA_ST31500341AS_9VS4R2MJ-part1
/dev/disk/by-id/scsi-SATA_ST31500341AS_9VS4Q3F4-part1
/dev/disk/by-id/scsi-SATA_ST31500341AS_9VS13W11-part1
/dev/disk/by-id/scsi-SATA_ST3000DM001-9YN_Z1F0WJSN-part1

Are those the only devices in your pool?

@FransUrbo
Copy link
Contributor Author

I'm going to try the different zdb command lines, but in the meantime - those disks looks like all of them. 15 disks (3x3TB+12x1.5TB). I also have a 120GB Corsair SSD disk as cache device, but I've removed that from the system, but not the pool (can't, any zpool remove/offline etc gives me a lockup).

        NAME                                            STATE     READ WRITE CKSUM
        share                                           ONLINE       0     0     0
          raidz1-0                                      ONLINE       0     0     0
            scsi-SATA_ST3000DM001-9YN_Z1F0WJSN          ONLINE       0     0     0
            ata-ST3000DM001-9YN166_Z1F0X41F             ONLINE       0     0     0
            ata-ST3000DM001-9YN166_Z1F0X3P4             ONLINE       0     0     0
          raidz1-1                                      ONLINE       0     0     0
            scsi-SATA_ST31500341AS_9VS4XK4T             ONLINE       0     0     0
            scsi-SATA_ST31500341AS_9VS4Q3F4             ONLINE       0     0     0
            scsi-SATA_ST1500DL003-9VT_5YD1F2KF          ONLINE       0     0     0
          raidz1-2                                      ONLINE       0     0     0
            scsi-SATA_ST31500341AS_9VS3SAWS             ONLINE       0     0     0
            scsi-SATA_ST31500341AS_9VS4R2MJ             ONLINE       0     0     0
            scsi-SATA_ST31500341AS_9VS13W11             ONLINE       0     0     0
          raidz1-3                                      ONLINE       0     0     0
            scsi-SATA_ST31500341AS_9VS4VT5R             ONLINE       0     0     0
            scsi-SATA_ST31500341AS_9VS4Q38C             ONLINE       0     0     0
            scsi-SATA_ST31500341AS_9VS4WM30             ONLINE       0     0     0
          raidz1-4                                      ONLINE       0     0     0
            scsi-SATA_ST31500341AS_9VS4VT5X             ONLINE       0     0     0
            scsi-SATA_ST31500341AS_9VS4WWPA             ONLINE       0     0     0
            scsi-SATA_ST31500341AS_9VS0H3A9             ONLINE       0     0     0
        cache
          ata-Corsair_Force_3_SSD_11486508000008952122  UNAVAIL      0     0     0

Strangely enough (and probably unrelated to the issue) is that all the disks that is prefixed with ata- used to be prefixed with scsi- as well a few days ago.

@FransUrbo
Copy link
Contributor Author

Running 'zdb -d share' gave an I/O error:

celia(SSD1):~# strace -o /tmp/strace_-d_share -ff zdb -d share
zdb: can't open 'share': Input/output error

The strace output can be found at http://bayour.com/misc/strace_zdb_-d_share/.

Loading the module, importing the pool, removing the cache file, rebooting (just to be sure :), load the module shows 'no pools availible'.
Importing the pool with 'zpool import -d /dev/disk/by-id -aNf' (had to force it for some reason) worked just fine.

Exporting the pool and then run 'zdb -e -p /dev/disk/by-id' gave http://bayour.com/misc/strace_zdb_-e_-p/zdb_-e_-p.out and the strace output http://bayour.com/misc/strace_zdb_-e_-p

@FransUrbo
Copy link
Contributor Author

Installing a non-debug version of spl/zfs GIT master makes it possible to do a zfs send AND create both snapshots as well as filesystems. But not volumes.

So compiling with debug interferes with the code in some way.

@FransUrbo
Copy link
Contributor Author

zdb -[dbS] share still gives an I/O error...

behlendorf added a commit to behlendorf/zfs that referenced this issue Dec 13, 2013
Early versions of ZFS coordinated the creation and destruction
of device minors from userspace.  This was inherently racy and
in late 2009 these ioctl()s were removed leaving everything up
to the kernel.  This significantly simplified the code.

However, we never picked up these changes in ZoL since we'd
already significantly adjusted this code for Linux.  This patch
aims to rectify that by finally removing ZFC_IOC_*_MINOR ioctl()s
and moving all the functionality down in to the kernel.  Since
this cleanup will change the kernel/user ABI it's being done
in the same tag as the previous libzfs_core ABI changes.  This
will minimize, but not eliminate, the disruption to end users.

Once merged ZoL, Illumos, and FreeBSD will basically be back
in sync in regards to handling ZVOLs in the common code.  While
each platform must have its own custom zvol.c implemenation the
interfaces provided are consistent.

NOTES:

1) This patch introduces one subtle change in behavior which
   could not be easily avoided.  Prior to this change callers
   of 'zfs create -V ...' were guaranteed that upon exit the
   /dev/zvol/ block device link would be created or an error
   returned.  That's no longer the case.  The utilities will no
   longer block waiting for the symlink to be created.  Callers
   are now responsible for blocking as needed and this is why
   a 'sleep 1' was added to zconfig.sh.

2) The read-only behavior of a ZVOL now solely depends on if
   the ZVOL_RDONLY bit is set in zv->zv_flags.  The redundant
   policy setting in the gendisk structure was removed.  This
   both simplifies the code and allows us to safely leverage
   set_disk_ro() to issue a KOBJ_CHANGE uevent.

3) Because __zvol_create_minor() and zvol_alloc() may now be
   called in a sync task they must now use KM_PUSHPAGE.

References:
  illumos-gate/illumos@681d9761e8516a7dc5ab6589e2dfe717777e1123

Signed-off-by: Brian Behlendorf <[email protected]>
Issue openzfs#1862
@behlendorf
Copy link
Contributor

@FransUrbo I was wrong in my comment above, the deadlock I thought I saw in the master code doesn't exist. Nor have I been able to reproduce what you're seeing in any of the testing. That coupled with the fact that no one else has reported anything like this makes me strongly suspect the deadlock was accidentally introduced by zfsrogue's zfs-crypt changes.

For the reasons you've mentioned above I'd prefer not to look at the code, but perhaps you could. The question which needs to be answered in why doesn't the dsl_pool_config lock get dropped in the following call stack. In the master code it's taken at the start of dmu_objset_find_impl but it's dropped here, here, and here before invoking the callback. It's then reacquired after the callback returns. Does the zfs-crypt code do this the same way?

spl_debug_bug+0x8d/0xd6 [spl]
txg_wait_synced+0x60/0x257 [zfs]
zil_destroy+0x28/0x3b8 [zfs]
__zvol_create_minor+0x565/0x7bc [zfs]
zvol_create_minors_cb+0x1d/0x21 [zfs]
dmu_objset_find_impl+0x473/0x489 [zfs]
dmu_objset_find+0x44/0x66 [zfs]
zvol_create_minors+0xa8/0x17f [zfs]
spa_import+0x709/0x71a [zfs]
zfs_ioc_pool_import+0xad/0xf4 [zfs]
zfsdev_ioctl+0x39a/0x42e [zfs]

Also I don't expect this will fix your issue but I've pushed and updated version of some long over due zvol cleanup for review, #1969.

@FransUrbo
Copy link
Contributor Author

I took a line by line look at dmu_objset_find_impl() and there is NO difference what so ever.

Looking at the diff against the zfs-crypto code and ZoL/master, there is only one added dsl_pool_config_enter() and one dsl_pool_config_exit() call, both in the same function (not related to volumes - I don't even wanted to say that much, but saying more...).

I have asked 'someone' that knows a lot more about this (both codes) to take a look, and if that person wants to contribute comments, fine. If not, I will forward them here.

@FransUrbo
Copy link
Contributor Author

@behlendorf I did find your branch in your own repo just shortly after you posted that you had a possible fix. So my findings in #1862 (comment) is with that part applied.

unya pushed a commit to unya/zfs that referenced this issue Dec 13, 2013
1862 incremental zfs receive fails for sparse file > 8PB

Reviewed by: Matt Ahrens <[email protected]>
Reviewed by: Simon Klinkert <[email protected]>
Approved by: Eric Schrock <[email protected]>

References:
  illumos/illumos-gate@31495a1
  illumos changeset: 13789:f0c17d471b7a
  https://www.illumos.org/issues/1862

Ported-by: Brian Behlendorf <[email protected]>
behlendorf added a commit to behlendorf/zfs that referenced this issue Dec 13, 2013
Early versions of ZFS coordinated the creation and destruction
of device minors from userspace.  This was inherently racy and
in late 2009 these ioctl()s were removed leaving everything up
to the kernel.  This significantly simplified the code.

However, we never picked up these changes in ZoL since we'd
already significantly adjusted this code for Linux.  This patch
aims to rectify that by finally removing ZFC_IOC_*_MINOR ioctl()s
and moving all the functionality down in to the kernel.  Since
this cleanup will change the kernel/user ABI it's being done
in the same tag as the previous libzfs_core ABI changes.  This
will minimize, but not eliminate, the disruption to end users.

Once merged ZoL, Illumos, and FreeBSD will basically be back
in sync in regards to handling ZVOLs in the common code.  While
each platform must have its own custom zvol.c implemenation the
interfaces provided are consistent.

NOTES:

1) This patch introduces one subtle change in behavior which
   could not be easily avoided.  Prior to this change callers
   of 'zfs create -V ...' were guaranteed that upon exit the
   /dev/zvol/ block device link would be created or an error
   returned.  That's no longer the case.  The utilities will no
   longer block waiting for the symlink to be created.  Callers
   are now responsible for blocking, this is why a 'udev_wait'
   call was added to the 'label' function in scripts/common.sh.

2) The read-only behavior of a ZVOL now solely depends on if
   the ZVOL_RDONLY bit is set in zv->zv_flags.  The redundant
   policy setting in the gendisk structure was removed.  This
   both simplifies the code and allows us to safely leverage
   set_disk_ro() to issue a KOBJ_CHANGE uevent.  See the
   comment in the code for futher details on this.

3) Because __zvol_create_minor() and zvol_alloc() may now be
   called in a sync task they must use KM_PUSHPAGE.

References:
  illumos-gate/illumos@681d9761e8516a7dc5ab6589e2dfe717777e1123

Signed-off-by: Brian Behlendorf <[email protected]>
Issue openzfs#1862
FransUrbo pushed a commit to FransUrbo/zfs that referenced this issue Dec 18, 2013
A debug patch to identify the writing process holding the rrwlock.
If after 60 seconds of blocking a write the lock cannot be taken
print to the console the process name and pid of the holder.

Signed-off-by: Brian Behlendorf <[email protected]>
Issue openzfs#1862
FransUrbo added a commit to FransUrbo/zfs that referenced this issue Dec 18, 2013
Early versions of ZFS coordinated the creation and destruction
of device minors from userspace.  This was inherently racy and
in late 2009 these ioctl()s were removed and everything was
left up to the kernel.

However, we never picked up these changes in ZoL since we'd
already significantly adjusted this code for Linux.  This patch
aims to rectify that by finally removing ZFC_IOC_*_MINOR and
moving all the functionality down in to the kernel.  This gets
ZoL effectively back in sync with Illumos and FreeBSD.

This change also updates the zvol_*_minors() functions based
on the FreeBSD implemenations.  This was done to avoid a deadlock
which was introduced by the lock restructuring in Illumos 3464.

References:
  illumos-gate/illumos@681d9761e8516a7dc5ab6589e2dfe717777e1123

TODO:
* zvol_set_snapdev() is subject to the same deadlock and must
  be reworked in a similar way to zvol_create_minors().
* __zvol_rename_minor() must be updated to prompt udev.
* Test, test, test.  This code can be subtle are there are quite
  a few cases to get right.

Signed-off-by: Brian Behlendorf <[email protected]>
Issue openzfs#1862
@lundman
Copy link
Contributor

lundman commented Dec 19, 2013

Sorry, always late to the party! Maybe I can help, or see if rogue is around. It is worth noting that this area of the sources are new, so there is no "other" sources to compare against any more.

I can reconfirm that dmu_objset_find_impl is identical, but I did notice that ZOL's dsl_userhold.c: dsl_dataset_user_release_impl() has more releases of the lock than we do.

This is quite a long issue now, @FransUrbo is there a set of instructions that makes the problem trigger? I assume it is no longer importing volumes?

@FransUrbo
Copy link
Contributor Author

@lundman Correct. As soon as I don't set zvol_inhibit_dev=1, it hangs but works (as in imports and mounts but without access to the ZVOLs of course) when I DO set it...

Also have a look at #1933. Have no idea if it's related, but since I get an invalid value, my thought was that maybe it's something wrong with the metadata or txg/dmu (whatever those are :) which triggers the dead lock.

@FransUrbo
Copy link
Contributor Author

I've push(ed) the exact code I'm using into https://github.com/FransUrbo/zfs/tree/iscsi+crypto+mine_0.6.2+.

---- WARNING ---
That code contains the crypto patch from zfs-rogue (as well as a bunch of other patches/pull requests not yet applied, including the ones mentioned here earlier), so anyone wanting to help with creating a 'true' opensource version of the crypto code in the future SHOULD NOT look at this!!
---- WARNING ---

@lundman
Copy link
Contributor

lundman commented Dec 19, 2013

Ah yep, pretty easy to get stuck waiting on rrwlock (dp_config), especially when using zvol, which is odd :)

@lundman
Copy link
Contributor

lundman commented Dec 19, 2013

Simple printing on lock grab and release gets this output:

Dec 19 15:13:01 zfsdev kernel: [  145.660771] loundmandassss
Dec 19 15:13:01 zfsdev kernel: [  145.660771] lock taken by 3421
Dec 19 15:13:01 zfsdev kernel: [  145.660771] lock relea by 3421
Dec 19 15:13:01 zfsdev kernel: [  145.688245] lock taken by 3421
Dec 19 15:13:01 zfsdev kernel: [  145.688566] lock relea by 3421
Dec 19 15:13:01 zfsdev kernel: [  145.688573] lock taken by 3421
Dec 19 15:13:01 zfsdev kernel: [  145.688632] lock relea by 3421
Dec 19 15:13:01 zfsdev kernel: [  145.688645] lock taken by 3421
Dec 19 15:13:01 zfsdev kernel: [  145.688701] lock relea by 3421
Dec 19 15:13:01 zfsdev kernel: [  145.688715] lock taken by 3421
Dec 19 15:13:01 zfsdev kernel: [  145.688761] lock relea by 3421
Dec 19 15:13:01 zfsdev kernel: [  145.708695] lock taken by 3421
Dec 19 15:13:01 zfsdev kernel: [  145.708757] lock relea by 3421
Dec 19 15:13:01 zfsdev kernel: [  145.728668] lock taken by 3421
Dec 19 15:13:01 zfsdev kernel: [  145.728866] lock relea by 3421
Dec 19 15:13:01 zfsdev kernel: [  145.728871] lock taken by 3421
Dec 19 15:13:01 zfsdev kernel: [  145.733254] lock taken by 3421  <-----
Dec 19 15:13:01 zfsdev kernel: [  145.733438] lock relea by 3421
Dec 19 15:13:01 zfsdev kernel: [  145.737276] lock taken by 3421
Dec 19 15:13:01 zfsdev kernel: [  145.737312] lock relea by 3421
Dec 19 15:13:01 zfsdev kernel: [  145.738223]  zd0: unknown partition table

Telling dsl_pool_config_enter to panic when already locked (do we often re-enter rrwlocks?)

Dec 19 15:53:56 zfsdev kernel: [  623.910096]  [<ffffffffa02642be>] dsl_pool_config_enter+0x6e/0x70 [zfs]
Dec 19 15:53:56 zfsdev kernel: [  623.910096]  [<ffffffffa0266319>] dsl_pool_hold+0x49/0x60 [zfs]
Dec 19 15:53:56 zfsdev kernel: [  623.910096]  [<ffffffffa0246460>] dmu_objset_own+0x40/0x140 [zfs]
Dec 19 15:53:56 zfsdev kernel: [  623.910096]  [<ffffffff81424fbb>] ? kobj_lookup+0x10b/0x160
Dec 19 15:53:56 zfsdev kernel: [  623.910096]  [<ffffffffa02e24b2>] zvol_open+0x132/0x300 [zfs]
Dec 19 15:53:56 zfsdev kernel: [  623.910585]  [<ffffffff81311d6b>] ? get_gendisk+0x3b/0x110
Dec 19 15:53:56 zfsdev kernel: [  623.910596]  [<ffffffff811bb340>] __blkdev_get+0xd0/0x4b0
Dec 19 15:53:56 zfsdev kernel: [  623.910603]  [<ffffffff811bb8cd>] blkdev_get+0x1ad/0x300
Dec 19 15:53:56 zfsdev kernel: [  623.910611]  [<ffffffff8119d1bc>] ? unlock_new_inode+0x5c/0x90
Dec 19 15:53:56 zfsdev kernel: [  623.910617]  [<ffffffff811ba912>] ? bdget+0x132/0x150
Dec 19 15:53:56 zfsdev kernel: [  623.910623]  [<ffffffff8131301c>] add_disk+0x37c/0x470
Dec 19 15:53:56 zfsdev kernel: [  623.910637]  [<ffffffffa01828cb>] ? kmem_free_debug+0x4b/0x150 [spl]
Dec 19 15:53:56 zfsdev kernel: [  623.910659]  [<ffffffffa0258993>] ? dsl_dataset_disown+0x63/0x90 [zfs]
Dec 19 15:53:56 zfsdev kernel: [  623.910684]  [<ffffffffa02e357f>] __zvol_create_minor+0x5af/0x640 [zfs]
Dec 19 15:53:56 zfsdev kernel: [  623.910709]  [<ffffffffa02e3f07>] zvol_create_minor+0x27/0x40 [zfs]
Dec 19 15:53:56 zfsdev kernel: [  623.910734]  [<ffffffffa02b0b3e>] zfs_ioc_create_minor+0xe/0x10 [zfs]
Dec 19 15:53:56 zfsdev kernel: [  623.910760]  [<ffffffffa02b84ca>] zfsdev_ioctl+0x4fa/0x570 [zfs]
Dec 19 15:53:56 zfsdev kernel: [  623.910767]  [<ffffffff8114ad04>] ? handle_pte_fault+0x94/0x430
Dec 19 15:53:56 zfsdev kernel: [  623.910774]  [<ffffffff811927b5>] ? release_open_intent+0x35/0x40
Dec 19 15:53:56 zfsdev kernel: [  623.910780]  [<ffffffff811928c8>] ? path_openat+0x108/0x430
Dec 19 15:53:56 zfsdev kernel: [  623.910786]  [<ffffffff8114bdf9>] ? handle_mm_fault+0x259/0x320
Dec 19 15:53:56 zfsdev kernel: [  623.910794]  [<ffffffff8168743c>] ? do_page_fault+0x1cc/0x4e0
Dec 19 15:53:56 zfsdev kernel: [  623.910801]  [<ffffffff81194df9>] do_vfs_ioctl+0x99/0x590
Dec 19 15:53:56 zfsdev kernel: [  623.910807]  [<ffffffff8119fea6>] ? alloc_fd+0xc6/0x110
Dec 19 15:53:56 zfsdev kernel: [  623.910816]  [<ffffffff8116ef2b>] ? kmem_cache_free+0x7b/0x100
Dec 19 15:53:56 zfsdev kernel: [  623.910822]  [<ffffffff8118ddf3>] ? putname+0x33/0x50
Dec 19 15:53:56 zfsdev kernel: [  623.910828]  [<ffffffff81195389>] sys_ioctl+0x99/0xa0
Dec 19 15:53:56 zfsdev kernel: [  623.910835]  [<ffffffff8168495e>] ? do_device_not_available+0xe/0x10
Dec 19 15:53:56 zfsdev kernel: [  623.910842]  [<ffffffff8168ba69>] system_call_fastpath+0x16/0x1b

Although, this is the second locker, possibly the one before is the one at fault.

Which is this guy:

Dec 19 15:58:39 zfsdev kernel: [   87.209904]  [<ffffffffa02612d6>] dsl_pool_config_enter+0x86/0x90 [zfs]
Dec 19 15:58:39 zfsdev kernel: [   87.209927]  [<ffffffffa0263339>] dsl_pool_hold+0x49/0x60 [zfs]
Dec 19 15:58:39 zfsdev kernel: [   87.209947]  [<ffffffffa0254578>] dsl_crypto_key_inherit+0x38/0x170 [zfs]
Dec 19 15:58:39 zfsdev kernel: [   87.209956]  [<ffffffff8103fa29>] ? default_spin_lock_flags+0x9/0x10
Dec 19 15:58:39 zfsdev kernel: [   87.209963]  [<ffffffff8107c6d2>] ? up+0x32/0x50
Dec 19 15:58:39 zfsdev kernel: [   87.209970]  [<ffffffff81053365>] ? console_unlock+0x285/0x3e0
Dec 19 15:58:39 zfsdev kernel: [   87.209984]  [<ffffffffa017fc53>] ? tsd_set+0x103/0x730 [spl]
Dec 19 15:58:39 zfsdev kernel: [   87.209992]  [<ffffffff81676225>] ? printk+0x61/0x63
Dec 19 15:58:39 zfsdev kernel: [   87.210015]  [<ffffffffa0261242>] ? dsl_pool_config_exit+0x32/0x40 [zfs]
Dec 19 15:58:39 zfsdev kernel: [   87.210037]  [<ffffffffa026337b>] ? dsl_pool_rele+0x2b/0x40 [zfs]
Dec 19 15:58:39 zfsdev kernel: [   87.210055]  [<ffffffffa02434b6>] ? dmu_objset_own+0x96/0x140 [zfs]
Dec 19 15:58:39 zfsdev kernel: [   87.210081]  [<ffffffffa02e0138>] __zvol_create_minor+0x148/0x640 [zfs]
Dec 19 15:58:39 zfsdev kernel: [   87.210089]  [<ffffffff81123c17>] ? unlock_page+0x27/0x30
Dec 19 15:58:39 zfsdev kernel: [   87.210114]  [<ffffffffa02e0f27>] zvol_create_minor+0x27/0x40 [zfs]
Dec 19 15:58:39 zfsdev kernel: [   87.210140]  [<ffffffffa02adb5e>] zfs_ioc_create_minor+0xe/0x10 [zfs]
Dec 19 15:58:39 zfsdev kernel: [   87.210165]  [<ffffffffa02b54ea>] zfsdev_ioctl+0x4fa/0x570 [zfs]

Which has no fewer than 6 returns without unlock.

An early pull request is available here: zfsrogue/zfs-crypto#41

If it is ok, we can ask rogue to merge it.

@FransUrbo
Copy link
Contributor Author

It seems to work. Halfway anyway...

I can import the pool without problem and all device nodes is created. However, when I try to mount the device, I get:

celia(SSD1):~# mount /dev/zvol/share/VirtualMachines/Astrix-part3 /mnt/disk

Message from syslogd@Celia at Dec 19 09:03:29 ...
 kernel:[  291.293658] SPLError: 11607:0:(zio.c:777:zio_write()) ASSERTION(zp->zp_checksum >= ZIO_CHECKSUM_OFF && zp->zp_checksum < ZIO_CHECKSUM_FUNCTIONS && zp->zp_compress >= ZIO_COMPRESS_OFF && zp->zp_compress < ZIO_COMPRESS_FUNCTIONS && DMU_OT_IS_VALID(zp->zp_type) && zp->zp_crypt >= ZIO_CRYPT_OFF && zp->zp_crypt < ZIO_CRYPT_FUNCTIONS && zp->zp_type < DMU_OT_NUMTYPES && zp->zp_level < 32 && zp->zp_copies > 0 && zp->zp_copies <= spa_max_replication(spa)) failed

Message from syslogd@Celia at Dec 19 09:03:29 ...
 kernel:[  291.293999] SPLError: 11607:0:(zio.c:777:zio_write()) SPL PANIC

That's identical to #1933.

@FransUrbo
Copy link
Contributor Author

I'm going to rebuild without debugging and see if it works any better.

@lundman
Copy link
Contributor

lundman commented Dec 19, 2013

That looks familiar, didn't we already decide those ASSERTs need to be updated, or removed?

@FransUrbo
Copy link
Contributor Author

!!WARNING -- possible crypto code in link -- WARNING!!

Not those, that was some other ASSERTs (related to SPL_VERSION) - FransUrbo/zfs-crypto@b7d66cf.

Do note that it's the DMU_OT_IS_VALID(zp->zp_type) part that's failing...

@FransUrbo
Copy link
Contributor Author

Ok, so compiling without debugging, then it seems to work. I can import, mount filesystems, including the ZVOL I tested without any problems.

I can't run zdb -[dbS] share (gives an I/O error). And zpool status -v still gives a malloc error. I don't if that is related to this problem, or something else though...

@FransUrbo
Copy link
Contributor Author

@lundman It seems that the fix isn't enough... Everything seems to work, but as soon as I started 'loading' the volumes, I got a hang again.

And after a reboot, I get a kernel panic when trying to import the volume.

@FransUrbo
Copy link
Contributor Author

There's a lot of activity on the disks while it tries to import the pool and the panic produces page after page on a 800x600 screen and I have no idea how to save it to disk. The sysrq keys don't work...

@FransUrbo
Copy link
Contributor Author

After a whole lot of reboots and attempted imports with corresponding panics, I seem to have gotten past the panics at least. Now I 'only' got a dead lock:

Dec 22 12:03:53 Celia kernel: [ 1083.367463] INFO: task zpool:11798 blocked for more than 120 seconds.
Dec 22 12:03:53 Celia kernel: [ 1083.367549]       Tainted: P           O 3.12.0+scst+tf.1 #5
Dec 22 12:03:53 Celia kernel: [ 1083.367629] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Dec 22 12:03:53 Celia kernel: [ 1083.367741] zpool           D ffff88041fd54ac0     0 11798  11094 0x20020000
Dec 22 12:03:53 Celia kernel: [ 1083.367744]  ffff8800ba433000 0000000000200086 000000000001bf90 ffffffff8107b897
Dec 22 12:03:53 Celia kernel: [ 1083.367747]  ffff8800ba476010 0000000000014ac0 0000000000014ac0 ffff8800ba433000
Dec 22 12:03:53 Celia kernel: [ 1083.367750]  0000000000014ac0 ffff8800ba477fd8 ffff8800ba477fd8 0000000000014ac0
Dec 22 12:03:53 Celia kernel: [ 1083.367752] Call Trace:
Dec 22 12:03:53 Celia kernel: [ 1083.367758]  [<ffffffff8107b897>] ? select_task_rq_fair+0x3d9/0x780
Dec 22 12:03:53 Celia kernel: [ 1083.367762]  [<ffffffff810185f9>] ? sched_clock+0x13/0x16
Dec 22 12:03:53 Celia kernel: [ 1083.367764]  [<ffffffff810767b1>] ? update_rq_clock+0x1e/0xec
Dec 22 12:03:53 Celia kernel: [ 1083.367766]  [<ffffffff810711fa>] ? check_preempt_curr+0x36/0x62
Dec 22 12:03:53 Celia kernel: [ 1083.367768]  [<ffffffff81071302>] ? ttwu_do_wakeup+0xf/0xab
Dec 22 12:03:53 Celia kernel: [ 1083.367770]  [<ffffffff81077cfd>] ? try_to_wake_up+0x22d/0x23f
Dec 22 12:03:53 Celia kernel: [ 1083.367774]  [<ffffffff813c4df4>] ? _raw_spin_lock_irqsave+0x33/0x38
Dec 22 12:03:53 Celia kernel: [ 1083.367780]  [<ffffffffa089e2d2>] ? cv_wait_common+0xd4/0x164 [spl]
Dec 22 12:03:53 Celia kernel: [ 1083.367782]  [<ffffffff8106acd0>] ? wake_up_bit+0x22/0x22
Dec 22 12:03:53 Celia kernel: [ 1083.367801]  [<ffffffffa098ed0f>] ? txg_wait_synced+0x134/0x156 [zfs]
Dec 22 12:03:53 Celia kernel: [ 1083.367817]  [<ffffffffa0988962>] ? spa_config_update+0xac/0xe2 [zfs]
Dec 22 12:03:53 Celia kernel: [ 1083.367832]  [<ffffffffa09875c8>] ? spa_import+0x588/0x5c9 [zfs]
Dec 22 12:03:53 Celia kernel: [ 1083.367836]  [<ffffffffa09179f3>] ? nvlist_lookup_common+0x4d/0x94 [znvpair]
Dec 22 12:03:53 Celia kernel: [ 1083.367849]  [<ffffffffa09b0011>] ? zfs_ioc_pool_import+0xad/0xf4 [zfs]
Dec 22 12:03:53 Celia kernel: [ 1083.367852]  [<ffffffffa0931c07>] ? pool_namecheck+0xd1/0x112 [zcommon]
Dec 22 12:03:53 Celia kernel: [ 1083.367865]  [<ffffffffa09b3282>] ? zfsdev_ioctl+0x33b/0x3d5 [zfs]
Dec 22 12:03:53 Celia kernel: [ 1083.367868]  [<ffffffff81169163>] ? compat_sys_ioctl+0x1bf/0xf30
Dec 22 12:03:53 Celia kernel: [ 1083.367870]  [<ffffffff813c4d93>] ? _raw_spin_unlock+0x9/0xb
Dec 22 12:03:53 Celia kernel: [ 1083.367872]  [<ffffffff8113fa09>] ? __alloc_fd+0xfa/0x10c
Dec 22 12:03:53 Celia kernel: [ 1083.367875]  [<ffffffff813c7909>] ? ia32_do_call+0x13/0x13

@FransUrbo
Copy link
Contributor Author

I'm back to kernel panics.... The few times when it DIDN'T panic must have been a fluke...

@FransUrbo
Copy link
Contributor Author

Importing it readonly seems to have worked.

@FransUrbo
Copy link
Contributor Author

After having it imported readonly and mounting (ro)/unmounting one ZVOL for half a day, I exported the pool and then imported it again writable. And this I didn't get any panic, dead lock OR SPL error!

This thing is really, really, REALLY starting to piss me off! :). It's impossible to draw any conclusion from anything that's happening and I can't reproduce any problem for long...

Every time the problem have worsened, I think I've been running a scrub (can't be absolutly sure, a million things have happen over the two months I've been having problem) and rebooted half way... After that, shit really hit the fan, and it seems just sheer luck that I get back to a semi-stable system...

@FransUrbo
Copy link
Contributor Author

I'm closing this, because it's now quite obvious that it's the zfs-crypto patch that fucked everything up.

behlendorf added a commit to behlendorf/zfs that referenced this issue Jun 9, 2015
Early versions of ZFS coordinated the creation and destruction
of device minors from userspace.  This was inherently racy and
in late 2009 these ioctl()s were removed leaving everything up
to the kernel.  This significantly simplified the code.

However, we never picked up these changes in ZoL since we'd
already significantly adjusted this code for Linux.  This patch
aims to rectify that by finally removing ZFC_IOC_*_MINOR ioctl()s
and moving all the functionality down in to the kernel.  Since
this cleanup will change the kernel/user ABI it's being done
in the same tag as the previous libzfs_core ABI changes.  This
will minimize, but not eliminate, the disruption to end users.

Once merged ZoL, Illumos, and FreeBSD will basically be back
in sync in regards to handling ZVOLs in the common code.  While
each platform must have its own custom zvol.c implemenation the
interfaces provided are consistent.

NOTES:

1) This patch introduces one subtle change in behavior which
   could not be easily avoided.  Prior to this change callers
   of 'zfs create -V ...' were guaranteed that upon exit the
   /dev/zvol/ block device link would be created or an error
   returned.  That's no longer the case.  The utilities will no
   longer block waiting for the symlink to be created.  Callers
   are now responsible for blocking, this is why a 'udev_wait'
   call was added to the 'label' function in scripts/common.sh.

2) The read-only behavior of a ZVOL now solely depends on if
   the ZVOL_RDONLY bit is set in zv->zv_flags.  The redundant
   policy setting in the gendisk structure was removed.  This
   both simplifies the code and allows us to safely leverage
   set_disk_ro() to issue a KOBJ_CHANGE uevent.  See the
   comment in the code for futher details on this.

3) Because __zvol_create_minor() and zvol_alloc() may now be
   called in a sync task they must use KM_PUSHPAGE.

References:
  illumos-gate/illumos@681d9761e8516a7dc5ab6589e2dfe717777e1123

Signed-off-by: Brian Behlendorf <[email protected]>
Issue openzfs#1862
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants
@behlendorf @FransUrbo @lundman @dweeezil and others