Duplicate device when trying to import pool at boot #1528

gcbirzan · 2013-06-18T16:59:42Z

https://gist.github.com/gcbirzan/9c1741574323cf1edc3d/raw/c5cc8e0cf112cf5527964a8cf9d5e9b6075828f9/gistfile1.txt is the traceback

Not sure if related, but we rebooted without zpool.cache and imported the pool, which seemed to fix it.

ryao · 2013-06-28T03:21:44Z

@gcbirzan Would you elaborate on which sources you used to build ZFS when this issue occurred and also how you could tell that there was a duplicate device?

With that said, the presence of zpool.cache is an important distinction. At present, the module initialization entry point is _init() in module/zfs/zfs_ioctl.c. _init() invokes spa_init() in , which invokes spa_config_load(), which reads the zpool.cache. _init() will also invoke zvol_init(), which invokes zvol_create_minors(). This occurs when the pool is in an unintialized state and would mean that we would call zvol_create_minors() again when zfs_ioc_pool_import is called to open the pool. That should be okay, because multiple calls to __zvol_create_minor() should fail with EEXIST, but it sounds like that is not happening.

I need more information to be certain, but pull request #1477 eliminates the zvol_create_minors() call in zvol_init(), which might address your problem. On the other hand, it does not explain the nature of these duplicate devices or how they appeared in the first place. The duplicate devices could be caused by udev, rather than the kernel code. Without more information, it is impossible for me to tell.

ryao · 2013-06-28T05:45:40Z

As an additional comment, invoking zvol_create_minors() when the pool is uninitialized is a bad thing to do. Pull request #1477 should fix that. The commit message does not explain that it fixes that, which is something that I should fix.

ryao · 2013-06-28T06:56:55Z

I had a chat with @gcbirzan and @behlendorf in IRC. I missed a few details in the paste that are relevant. At the moment, my best guess is that we either had a bit flip change zd0 to something else (causing us to think zd0 was free) or the compiler generated bad code. I cannot see any other explanation for this issue.

behlendorf · 2013-06-28T16:18:09Z

@gcbirzan Was this a one time event or can you reproduce it?

The following error will occur on some (possibly all) kernels because blk_init_queue() will try to take the spinlock before we initialize it. [ 5.538871] BUG: spinlock bad magic on CPU#0, zpool/4054 [ 5.538885] lock: 0xffff88021a73de60, .magic: 00000000, .owner: <none>/-1, .owner_cpu: 0 [ 5.538888] Pid: 4054, comm: zpool Not tainted 3.9.3 openzfs#11 [ 5.538890] Call Trace: [ 5.538898] [<ffffffff81478ef8>] spin_dump+0x8c/0x91 [ 5.538902] [<ffffffff81478f1e>] spin_bug+0x21/0x26 [ 5.538906] [<ffffffff812da097>] do_raw_spin_lock+0x127/0x130 [ 5.538911] [<ffffffff81253301>] ? zvol_probe+0x91/0xf0 [ 5.538914] [<ffffffff8147d851>] _raw_spin_lock_irq+0x21/0x30 [ 5.538919] [<ffffffff812c2c1e>] cfq_init_queue+0x1fe/0x350 [ 5.538922] [<ffffffff81253360>] ? zvol_probe+0xf0/0xf0 [ 5.538926] [<ffffffff812aacb8>] elevator_init+0x78/0x140 [ 5.538930] [<ffffffff812b2677>] blk_init_allocated_queue+0x87/0xb0 [ 5.538933] [<ffffffff81253360>] ? zvol_probe+0xf0/0xf0 [ 5.538937] [<ffffffff812b26d5>] blk_init_queue_node+0x35/0x70 [ 5.538941] [<ffffffff812b271e>] blk_init_queue+0xe/0x10 [ 5.538944] [<ffffffff8125211b>] __zvol_create_minor+0x24b/0x620 [ 5.538947] [<ffffffff81253264>] zvol_create_minors_cb+0x24/0x30 [ 5.538952] [<ffffffff811bd9ca>] dmu_objset_find_spa+0xea/0x510 [ 5.538955] [<ffffffff81253240>] ? zvol_free+0x60/0x60 [ 5.538958] [<ffffffff811bda71>] dmu_objset_find_spa+0x191/0x510 [ 5.538962] [<ffffffff81253240>] ? zvol_free+0x60/0x60 [ 5.538965] [<ffffffff81253ea2>] zvol_create_minors+0x92/0x180 [ 5.538969] [<ffffffff811f8d80>] spa_open_common+0x250/0x380 [ 5.538973] [<ffffffff811f8ece>] spa_open+0xe/0x10 [ 5.538977] [<ffffffff8122817e>] pool_status_check.part.22+0x1e/0x80 [ 5.538980] [<ffffffff81228a55>] zfsdev_ioctl+0x155/0x190 [ 5.538984] [<ffffffff8116a695>] do_vfs_ioctl+0x325/0x5a0 [ 5.538989] [<ffffffff81163f1d>] ? final_putname+0x1d/0x40 [ 5.538992] [<ffffffff8116a950>] sys_ioctl+0x40/0x80 [ 5.538996] [<ffffffff814812c9>] ? do_page_fault+0x9/0x10 [ 5.539000] [<ffffffff81483929>] system_call_fastpath+0x16/0x1b [ 5.541118] zd0: unknown partition table We fix this by calling spin_lock_init before blk_init_queue. The manner in which zvol_init() initializes structures is suspectible to a race between initialization and a probe on a zvol. We reorganize zvol_init() to prevent that. Lastly, calling zvol_create_minors(NULL) in zvol_init() does nothing because no pools are imported, so we remove it. Signed-off-by: Richard Yao <[email protected]>

behlendorf · 2013-07-11T18:17:10Z

@gcbirzan The zvol changes have been merged in to master, are you still able to recreate this issue? If not I'm just going to close this.

behlendorf mentioned this issue Jun 21, 2013

Improve zvol initialization code #1477

Closed

behlendorf closed this as completed Aug 14, 2013

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Duplicate device when trying to import pool at boot #1528

Duplicate device when trying to import pool at boot #1528

gcbirzan commented Jun 18, 2013

ryao commented Jun 28, 2013

ryao commented Jun 28, 2013

ryao commented Jun 28, 2013

behlendorf commented Jun 28, 2013

behlendorf commented Jul 11, 2013

Duplicate device when trying to import pool at boot #1528

Duplicate device when trying to import pool at boot #1528

Comments

gcbirzan commented Jun 18, 2013

ryao commented Jun 28, 2013

ryao commented Jun 28, 2013

ryao commented Jun 28, 2013

behlendorf commented Jun 28, 2013

behlendorf commented Jul 11, 2013