Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicate device when trying to import pool at boot #1528

Closed
gcbirzan opened this issue Jun 18, 2013 · 5 comments
Closed

Duplicate device when trying to import pool at boot #1528

gcbirzan opened this issue Jun 18, 2013 · 5 comments
Labels
Component: ZVOL ZFS Volumes
Milestone

Comments

@gcbirzan
Copy link

https://gist.github.com/gcbirzan/9c1741574323cf1edc3d/raw/c5cc8e0cf112cf5527964a8cf9d5e9b6075828f9/gistfile1.txt is the traceback

Not sure if related, but we rebooted without zpool.cache and imported the pool, which seemed to fix it.

@ryao
Copy link
Contributor

ryao commented Jun 28, 2013

@gcbirzan Would you elaborate on which sources you used to build ZFS when this issue occurred and also how you could tell that there was a duplicate device?

With that said, the presence of zpool.cache is an important distinction. At present, the module initialization entry point is _init() in module/zfs/zfs_ioctl.c. _init() invokes spa_init() in , which invokes spa_config_load(), which reads the zpool.cache. _init() will also invoke zvol_init(), which invokes zvol_create_minors(). This occurs when the pool is in an unintialized state and would mean that we would call zvol_create_minors() again when zfs_ioc_pool_import is called to open the pool. That should be okay, because multiple calls to __zvol_create_minor() should fail with EEXIST, but it sounds like that is not happening.

I need more information to be certain, but pull request #1477 eliminates the zvol_create_minors() call in zvol_init(), which might address your problem. On the other hand, it does not explain the nature of these duplicate devices or how they appeared in the first place. The duplicate devices could be caused by udev, rather than the kernel code. Without more information, it is impossible for me to tell.

@ryao
Copy link
Contributor

ryao commented Jun 28, 2013

As an additional comment, invoking zvol_create_minors() when the pool is uninitialized is a bad thing to do. Pull request #1477 should fix that. The commit message does not explain that it fixes that, which is something that I should fix.

@ryao
Copy link
Contributor

ryao commented Jun 28, 2013

I had a chat with @gcbirzan and @behlendorf in IRC. I missed a few details in the paste that are relevant. At the moment, my best guess is that we either had a bit flip change zd0 to something else (causing us to think zd0 was free) or the compiler generated bad code. I cannot see any other explanation for this issue.

@behlendorf
Copy link
Contributor

@gcbirzan Was this a one time event or can you reproduce it?

ryao referenced this issue in ryao/zfs Jul 2, 2013
The following error will occur on some (possibly all) kernels because
blk_init_queue() will try to take the spinlock before we initialize it.

[    5.538871] BUG: spinlock bad magic on CPU#0, zpool/4054
[    5.538885]  lock: 0xffff88021a73de60, .magic: 00000000, .owner:
<none>/-1, .owner_cpu: 0
[    5.538888] Pid: 4054, comm: zpool Not tainted 3.9.3 openzfs#11
[    5.538890] Call Trace:
[    5.538898]  [<ffffffff81478ef8>] spin_dump+0x8c/0x91
[    5.538902]  [<ffffffff81478f1e>] spin_bug+0x21/0x26
[    5.538906]  [<ffffffff812da097>] do_raw_spin_lock+0x127/0x130
[    5.538911]  [<ffffffff81253301>] ? zvol_probe+0x91/0xf0
[    5.538914]  [<ffffffff8147d851>] _raw_spin_lock_irq+0x21/0x30
[    5.538919]  [<ffffffff812c2c1e>] cfq_init_queue+0x1fe/0x350
[    5.538922]  [<ffffffff81253360>] ? zvol_probe+0xf0/0xf0
[    5.538926]  [<ffffffff812aacb8>] elevator_init+0x78/0x140
[    5.538930]  [<ffffffff812b2677>] blk_init_allocated_queue+0x87/0xb0
[    5.538933]  [<ffffffff81253360>] ? zvol_probe+0xf0/0xf0
[    5.538937]  [<ffffffff812b26d5>] blk_init_queue_node+0x35/0x70
[    5.538941]  [<ffffffff812b271e>] blk_init_queue+0xe/0x10
[    5.538944]  [<ffffffff8125211b>] __zvol_create_minor+0x24b/0x620
[    5.538947]  [<ffffffff81253264>] zvol_create_minors_cb+0x24/0x30
[    5.538952]  [<ffffffff811bd9ca>] dmu_objset_find_spa+0xea/0x510
[    5.538955]  [<ffffffff81253240>] ? zvol_free+0x60/0x60
[    5.538958]  [<ffffffff811bda71>] dmu_objset_find_spa+0x191/0x510
[    5.538962]  [<ffffffff81253240>] ? zvol_free+0x60/0x60
[    5.538965]  [<ffffffff81253ea2>] zvol_create_minors+0x92/0x180
[    5.538969]  [<ffffffff811f8d80>] spa_open_common+0x250/0x380
[    5.538973]  [<ffffffff811f8ece>] spa_open+0xe/0x10
[    5.538977]  [<ffffffff8122817e>] pool_status_check.part.22+0x1e/0x80
[    5.538980]  [<ffffffff81228a55>] zfsdev_ioctl+0x155/0x190
[    5.538984]  [<ffffffff8116a695>] do_vfs_ioctl+0x325/0x5a0
[    5.538989]  [<ffffffff81163f1d>] ? final_putname+0x1d/0x40
[    5.538992]  [<ffffffff8116a950>] sys_ioctl+0x40/0x80
[    5.538996]  [<ffffffff814812c9>] ? do_page_fault+0x9/0x10
[    5.539000]  [<ffffffff81483929>] system_call_fastpath+0x16/0x1b
[    5.541118]  zd0: unknown partition table

We fix this by calling spin_lock_init before blk_init_queue.

The manner in which zvol_init() initializes structures is
suspectible to a race between initialization and a probe on a zvol. We
reorganize zvol_init() to prevent that.

Lastly, calling zvol_create_minors(NULL) in zvol_init() does nothing
because no pools are imported, so we remove it.

Signed-off-by: Richard Yao <[email protected]>
@behlendorf
Copy link
Contributor

@gcbirzan The zvol changes have been merged in to master, are you still able to recreate this issue? If not I'm just going to close this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: ZVOL ZFS Volumes
Projects
None yet
Development

No branches or pull requests

3 participants