-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Duplicate device when trying to import pool at boot #1528
Comments
@gcbirzan Would you elaborate on which sources you used to build ZFS when this issue occurred and also how you could tell that there was a duplicate device? With that said, the presence of zpool.cache is an important distinction. At present, the module initialization entry point is _init() in module/zfs/zfs_ioctl.c. _init() invokes spa_init() in , which invokes spa_config_load(), which reads the zpool.cache. _init() will also invoke zvol_init(), which invokes zvol_create_minors(). This occurs when the pool is in an unintialized state and would mean that we would call zvol_create_minors() again when zfs_ioc_pool_import is called to open the pool. That should be okay, because multiple calls to __zvol_create_minor() should fail with EEXIST, but it sounds like that is not happening. I need more information to be certain, but pull request #1477 eliminates the zvol_create_minors() call in zvol_init(), which might address your problem. On the other hand, it does not explain the nature of these duplicate devices or how they appeared in the first place. The duplicate devices could be caused by udev, rather than the kernel code. Without more information, it is impossible for me to tell. |
As an additional comment, invoking zvol_create_minors() when the pool is uninitialized is a bad thing to do. Pull request #1477 should fix that. The commit message does not explain that it fixes that, which is something that I should fix. |
I had a chat with @gcbirzan and @behlendorf in IRC. I missed a few details in the paste that are relevant. At the moment, my best guess is that we either had a bit flip change zd0 to something else (causing us to think zd0 was free) or the compiler generated bad code. I cannot see any other explanation for this issue. |
@gcbirzan Was this a one time event or can you reproduce it? |
The following error will occur on some (possibly all) kernels because blk_init_queue() will try to take the spinlock before we initialize it. [ 5.538871] BUG: spinlock bad magic on CPU#0, zpool/4054 [ 5.538885] lock: 0xffff88021a73de60, .magic: 00000000, .owner: <none>/-1, .owner_cpu: 0 [ 5.538888] Pid: 4054, comm: zpool Not tainted 3.9.3 openzfs#11 [ 5.538890] Call Trace: [ 5.538898] [<ffffffff81478ef8>] spin_dump+0x8c/0x91 [ 5.538902] [<ffffffff81478f1e>] spin_bug+0x21/0x26 [ 5.538906] [<ffffffff812da097>] do_raw_spin_lock+0x127/0x130 [ 5.538911] [<ffffffff81253301>] ? zvol_probe+0x91/0xf0 [ 5.538914] [<ffffffff8147d851>] _raw_spin_lock_irq+0x21/0x30 [ 5.538919] [<ffffffff812c2c1e>] cfq_init_queue+0x1fe/0x350 [ 5.538922] [<ffffffff81253360>] ? zvol_probe+0xf0/0xf0 [ 5.538926] [<ffffffff812aacb8>] elevator_init+0x78/0x140 [ 5.538930] [<ffffffff812b2677>] blk_init_allocated_queue+0x87/0xb0 [ 5.538933] [<ffffffff81253360>] ? zvol_probe+0xf0/0xf0 [ 5.538937] [<ffffffff812b26d5>] blk_init_queue_node+0x35/0x70 [ 5.538941] [<ffffffff812b271e>] blk_init_queue+0xe/0x10 [ 5.538944] [<ffffffff8125211b>] __zvol_create_minor+0x24b/0x620 [ 5.538947] [<ffffffff81253264>] zvol_create_minors_cb+0x24/0x30 [ 5.538952] [<ffffffff811bd9ca>] dmu_objset_find_spa+0xea/0x510 [ 5.538955] [<ffffffff81253240>] ? zvol_free+0x60/0x60 [ 5.538958] [<ffffffff811bda71>] dmu_objset_find_spa+0x191/0x510 [ 5.538962] [<ffffffff81253240>] ? zvol_free+0x60/0x60 [ 5.538965] [<ffffffff81253ea2>] zvol_create_minors+0x92/0x180 [ 5.538969] [<ffffffff811f8d80>] spa_open_common+0x250/0x380 [ 5.538973] [<ffffffff811f8ece>] spa_open+0xe/0x10 [ 5.538977] [<ffffffff8122817e>] pool_status_check.part.22+0x1e/0x80 [ 5.538980] [<ffffffff81228a55>] zfsdev_ioctl+0x155/0x190 [ 5.538984] [<ffffffff8116a695>] do_vfs_ioctl+0x325/0x5a0 [ 5.538989] [<ffffffff81163f1d>] ? final_putname+0x1d/0x40 [ 5.538992] [<ffffffff8116a950>] sys_ioctl+0x40/0x80 [ 5.538996] [<ffffffff814812c9>] ? do_page_fault+0x9/0x10 [ 5.539000] [<ffffffff81483929>] system_call_fastpath+0x16/0x1b [ 5.541118] zd0: unknown partition table We fix this by calling spin_lock_init before blk_init_queue. The manner in which zvol_init() initializes structures is suspectible to a race between initialization and a probe on a zvol. We reorganize zvol_init() to prevent that. Lastly, calling zvol_create_minors(NULL) in zvol_init() does nothing because no pools are imported, so we remove it. Signed-off-by: Richard Yao <[email protected]>
@gcbirzan The zvol changes have been merged in to master, are you still able to recreate this issue? If not I'm just going to close this. |
https://gist.github.com/gcbirzan/9c1741574323cf1edc3d/raw/c5cc8e0cf112cf5527964a8cf9d5e9b6075828f9/gistfile1.txt is the traceback
Not sure if related, but we rebooted without zpool.cache and imported the pool, which seemed to fix it.
The text was updated successfully, but these errors were encountered: