Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(dsl_pool.c:1032:dsl_pool_config_enter()) ASSERTION(!rrw_held(&dp->dp_config_rwlock, RW_READER)) failed #1886

Closed
chrisrd opened this issue Nov 19, 2013 · 6 comments
Milestone

Comments

@chrisrd
Copy link
Contributor

chrisrd commented Nov 19, 2013

I'm getting this whilst trying to do a 'zpool list' after a fresh boot.

Linux-3.10.18, using current HEAD: zfs@64ad2b26, spl@dd33a16.

The ASSERT() comes from #1495, zfs@13fe019 Illumos #3464

Example kern.log extract:

Nov 19 13:35:35 b5 kernel: [ 3978.074853] SPLError: 25099:0:(dsl_pool.c:1032:dsl_pool_config_enter()) ASSERTION(!rrw_held(&dp->dp_config_rwlock, RW_READER)) failed
Nov 19 13:35:35 b5 kernel: [ 3978.074914] SPLError: 25099:0:(dsl_pool.c:1032:dsl_pool_config_enter()) SPL PANIC
Nov 19 13:35:35 b5 kernel: [ 3978.074960] SPL: Showing stack for process 25099
Nov 19 13:35:35 b5 kernel: [ 3978.074988] CPU: 10 PID: 25099 Comm: zfs Tainted: G           O 3.10.18-otn-00011-g883dc78 #1
Nov 19 13:35:35 b5 kernel: [ 3978.075051] Hardware name: Supermicro X8DTH-i/6/iF/6F/X8DTH, BIOS 2.0c       07/19/11  
Nov 19 13:35:35 b5 kernel: [ 3978.075096]  ffffffffa0741247 ffff880c20e55408 ffffffff813c611e ffff880c20e55418
Nov 19 13:35:35 b5 kernel: [ 3978.075148]  ffffffffa0074557 ffff880c20e55458 ffffffffa00756ba ffffffffa00893dd
Nov 19 13:35:35 b5 kernel: [ 3978.075199]  ffff88028be2ed40 ffffffffa0741247 ffff880c20e554e8 0000000000000000
Nov 19 13:35:35 b5 kernel: [ 3978.075251] Call Trace:
Nov 19 13:35:35 b5 kernel: [ 3978.075281]  [<ffffffff813c611e>] dump_stack+0x19/0x1b
Nov 19 13:35:35 b5 kernel: [ 3978.075314]  [<ffffffffa0074557>] spl_debug_dumpstack+0x27/0x40 [spl]
Nov 19 13:35:35 b5 kernel: [ 3978.075345]  [<ffffffffa00756ba>] spl_debug_bug+0x7a/0xe0 [spl]
Nov 19 13:35:35 b5 kernel: [ 3978.075406]  [<ffffffffa06802ed>] dsl_pool_config_enter+0x8d/0x90 [zfs]
Nov 19 13:35:35 b5 kernel: [ 3978.075456]  [<ffffffffa06832f9>] dsl_pool_hold+0x49/0x60 [zfs]
Nov 19 13:35:35 b5 kernel: [ 3978.075502]  [<ffffffffa065510b>] dmu_objset_hold+0x2b/0xb0 [zfs]
Nov 19 13:35:35 b5 kernel: [ 3978.075557]  [<ffffffffa06ec7d8>] get_zfs_sb+0x28/0x150 [zfs]
Nov 19 13:35:35 b5 kernel: [ 3978.075610]  [<ffffffffa06ecab1>] zfs_sb_hold+0x31/0xb0 [zfs]
Nov 19 13:35:35 b5 kernel: [ 3978.075664]  [<ffffffffa06f31ea>] zfs_unmount_snap+0x8a/0x140 [zfs]
Nov 19 13:35:35 b5 kernel: [ 3978.075720]  [<ffffffffa0734412>] dsl_dataset_user_release_impl+0xd2/0x1c0 [zfs]
Nov 19 13:35:35 b5 kernel: [ 3978.075791]  [<ffffffffa07342f0>] ? dsl_dataset_user_release_sync+0x3b0/0x3b0 [zfs]
Nov 19 13:35:35 b5 kernel: [ 3978.075838]  [<ffffffffa00fd853>] ? avl_find+0x63/0x120 [zavl]
Nov 19 13:35:35 b5 kernel: [ 3978.075866]  [<ffffffffa00fe6fa>] ? avl_nearest+0x3a/0xa0 [zavl]
Nov 19 13:35:35 b5 kernel: [ 3978.075920]  [<ffffffffa06db62c>] ? zap_cursor_retrieve+0x1ac/0x430 [zfs]
Nov 19 13:35:35 b5 kernel: [ 3978.075953]  [<ffffffffa007871b>] ? kmem_free_debug+0x4b/0x150 [spl]
Nov 19 13:35:35 b5 kernel: [ 3978.076008]  [<ffffffffa073534d>] dsl_dataset_user_release_tmp+0x2d/0x80 [zfs]
Nov 19 13:35:35 b5 kernel: [ 3978.076074]  [<ffffffffa0682f4b>] dsl_pool_clean_tmp_userrefs+0x12b/0x1a0 [zfs]
Nov 19 13:35:35 b5 kernel: [ 3978.076141]  [<ffffffffa0696246>] ? refcount_remove_many+0x1b6/0x330 [zfs]
Nov 19 13:35:35 b5 kernel: [ 3978.076194]  [<ffffffffa06963d6>] ? refcount_remove+0x16/0x20 [zfs]
Nov 19 13:35:35 b5 kernel: [ 3978.076246]  [<ffffffffa06b1416>] ? spa_close+0x46/0xb0 [zfs]
Nov 19 13:35:35 b5 kernel: [ 3978.076300]  [<ffffffffa0730750>] ? dsl_destroy_head+0x160/0x160 [zfs]
Nov 19 13:35:35 b5 kernel: [ 3978.076355]  [<ffffffffa0730750>] ? dsl_destroy_head+0x160/0x160 [zfs]
Nov 19 13:35:35 b5 kernel: [ 3978.076410]  [<ffffffffa0730750>] ? dsl_destroy_head+0x160/0x160 [zfs]
Nov 19 13:35:35 b5 kernel: [ 3978.076456]  [<ffffffffa0651305>] ? dmu_objset_find+0x65/0x80 [zfs]
Nov 19 13:35:35 b5 kernel: [ 3978.076509]  [<ffffffffa06a8c6d>] spa_load+0x1e9d/0x1ff0 [zfs]
Nov 19 13:35:35 b5 kernel: [ 3978.076540]  [<ffffffff81061740>] ? finish_wait+0x80/0x80
Nov 19 13:35:35 b5 kernel: [ 3978.076591]  [<ffffffffa06a7a56>] spa_load+0xc86/0x1ff0 [zfs]
Nov 19 13:35:35 b5 kernel: [ 3978.076619]  [<ffffffff81061723>] ? finish_wait+0x63/0x80
Nov 19 13:35:35 b5 kernel: [ 3978.076646]  [<ffffffff81061740>] ? finish_wait+0x80/0x80
Nov 19 13:35:35 b5 kernel: [ 3978.076697]  [<ffffffffa06a41cb>] ? spa_activate+0x19b/0x6e0 [zfs]
Nov 19 13:35:35 b5 kernel: [ 3978.076750]  [<ffffffffa06a8e0d>] spa_load_best+0x4d/0x2e0 [zfs]
Nov 19 13:35:35 b5 kernel: [ 3978.076802]  [<ffffffffa06b1e49>] ? spa_lookup+0x69/0xc0 [zfs]
Nov 19 13:35:35 b5 kernel: [ 3978.076854]  [<ffffffffa06a91e8>] spa_open_common+0x148/0x440 [zfs]
Nov 19 13:35:35 b5 kernel: [ 3978.076883]  [<ffffffff8112a506>] ? __kmalloc+0x156/0x160
Nov 19 13:35:35 b5 kernel: [ 3978.076934]  [<ffffffffa06a94f3>] spa_open+0x13/0x20 [zfs]
Nov 19 13:35:35 b5 kernel: [ 3978.076987]  [<ffffffffa06f35ab>] pool_status_check+0x5b/0x110 [zfs]
Nov 19 13:35:35 b5 kernel: [ 3978.077042]  [<ffffffffa06f3a22>] zfsdev_ioctl+0x3c2/0x5d0 [zfs]
Nov 19 13:35:35 b5 kernel: [ 3978.077071]  [<ffffffff8114c187>] ? prepend+0x27/0x40
Nov 19 13:35:35 b5 kernel: [ 3978.077099]  [<ffffffff811559b4>] ? mntput+0x24/0x40
Nov 19 13:35:35 b5 kernel: [ 3978.077126]  [<ffffffff81142072>] ? path_put+0x22/0x30
Nov 19 13:35:35 b5 kernel: [ 3978.077152]  [<ffffffff8114d045>] ? d_path+0x105/0x150
Nov 19 13:35:35 b5 kernel: [ 3978.077180]  [<ffffffff81159002>] ? seq_escape+0xb2/0x120
Nov 19 13:35:35 b5 kernel: [ 3978.077206]  [<ffffffff81158aa9>] ? seq_puts+0x49/0x70
Nov 19 13:35:35 b5 kernel: [ 3978.077233]  [<ffffffff81158aa9>] ? seq_puts+0x49/0x70
Nov 19 13:35:35 b5 kernel: [ 3978.077261]  [<ffffffff811749e5>] ? show_vfsmnt+0x115/0x150
Nov 19 13:35:35 b5 kernel: [ 3978.077288]  [<ffffffff81154918>] ? m_start+0x38/0x50
Nov 19 13:35:35 b5 kernel: [ 3978.077314]  [<ffffffff81149af0>] do_vfs_ioctl+0x90/0x520
Nov 19 13:35:35 b5 kernel: [ 3978.077341]  [<ffffffff81138bb9>] ? vfs_read+0xa9/0x180
Nov 19 13:35:35 b5 kernel: [ 3978.077368]  [<ffffffff81149fd0>] SyS_ioctl+0x50/0x90
Nov 19 13:35:35 b5 kernel: [ 3978.077396]  [<ffffffff813d3082>] system_call_fastpath+0x16/0x1b
@chrisrd
Copy link
Contributor Author

chrisrd commented Nov 19, 2013

Call chain looks like:

zfsdev_ioctl()
  pool_status_check()
    spa_open()
      spa_open_common()
        spa_load_best()
          spa_load()
            spa_load_impl()
              dsl_pool_clean_tmp_userrefs()
                dsl_dataset_user_release_tmp()
               +- dsl_dataset_user_release_impl()
               |    zfs_unmount_snap(name)
               |    zfs_unmount_snap(nvpair_name(pair))
               |      zfs_sb_hold()
               |        get_zfs_sb()
               |          dmu_objset_hold()
               |            dsl_pool_hold()
          +>>>>>>>>>>>>>>>>>> dsl_pool_config_enter()
          |    |                ASSERT(!rrw_held(&dp->dp_config_rwlock, RW_READER));
          |    +- dsl_dataset_user_release_impl()
          |         error = dsl_sync_task(pool, dsl_dataset_user_release_check, dsl_dataset_user_release_sync, &ddura, fnvlist_num_pairs(holds));
          |           dsl_sync_task()
          +>>>>>>>>>>>> dsl_pool_config_enter()

Hmmm, with those zfs_unmount_snap() in there, it looks like it could be the same as
https://www.illumos.org/issues/4322

@chrisrd
Copy link
Contributor Author

chrisrd commented Nov 19, 2013

Simplified call chain:

zfsdev_ioctl()
  pool_status_check()
    spa_open()
      spa_open_common()
        spa_load_best()
          spa_load()
            spa_load_impl()
              dsl_pool_clean_tmp_userrefs()
                dsl_dataset_user_release_tmp()
                  dsl_dataset_user_release_impl()
          +>>>>>>>> dsl_pool_config_enter()               #### 1
          |         zfs_unmount_snap(name)
          |           zfs_sb_hold()
          |             get_zfs_sb()
          |               dmu_objset_hold()
          |                 dsl_pool_hold()
          +>>>>>>>>>>>>>>>>>> dsl_pool_config_enter()    #### 2  BOOM!!

I think definitely the same as https://www.illumos.org/issues/4322

@dweeezil
Copy link
Contributor

@chrisrd It sounds like you're right. From reading the illumos issue, the author, @stevenh is apparently currently evaluating fixes for this. From a quick read, the proposed fix is to reduce the dsl_pool_config_enter()/dsl_pool_config_exit() to only cover the call to dsl_dataset_hold_obj_string() but the question is whether this is sufficient in light of the calls to dsl_dataset_name() and dsl_dataset_rele().

In the mean time, I guess the only workaround is to not zfs send mounted snapshots.

@stevenh
Copy link
Contributor

stevenh commented Nov 19, 2013

The change was submitted for RTI a few days ago.

For the record it was demed that holding the lock over dsl_dataset_hold_obj_string()
and dsl_dataset_name() was sufficient.

Regards
Steve

----- Original Message -----
From: "Tim Chase" [email protected]
To: "zfsonlinux/zfs" [email protected]
Cc: "Steven Hartland" [email protected]
Sent: Tuesday, November 19, 2013 1:36 PM
Subject: Re: zfs ASSERTION(!rrw_held(&dp->dp_config_rwlock, RW_READER)) failed (#1886)

@chrisrd It sounds like you're right. From reading the illumos issue, the author, @stevenh is apparently currently evaluating
fixes for this. From a quick read, the proposed fix is to reduce the dsl_pool_config_enter()/dsl_pool_config_exit() to only
cover the call to dsl_dataset_hold_obj_string() but the question is whether this is sufficient in light of the calls to
dsl_dataset_name() and dsl_dataset_rele().

In the mean time, I guess the only workaround is to not zfs send mounted snapshots.


Reply to this email directly or view it on GitHub:
#1886 (comment)

This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it.

In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337
or return the E.mail to [email protected].

@chrisrd
Copy link
Contributor Author

chrisrd commented Nov 19, 2013

@stevenh Thanks, I found your illumos/illumos-gate@c50d56f fix 15 minutes ago and have ported it over to my ZoL repository - just compiling it now to test on my data set before putting up a pull request.

chrisrd added a commit to chrisrd/zfs that referenced this issue Nov 19, 2013
4322 ZFS deadlock on dp_config_rwlock
Reviewed by: Matthew Ahrens <[email protected]>
Reviewed by: Ilya Usvyatsky <[email protected]>
Approved by: Dan McDonald <[email protected]>

References:
  https://www.illumos.org/issues/4322
  illumos/illumos-gate@c50d56f

Ported by: Chris Dunlop <[email protected]>
Closes openzfs#1886
@chrisrd
Copy link
Contributor Author

chrisrd commented Nov 19, 2013

@dweeezil, @stevenh Yup, that fixed my problem - pull request sent.

unya pushed a commit to unya/zfs that referenced this issue Dec 13, 2013
4322 ZFS deadlock on dp_config_rwlock
Reviewed by: Matthew Ahrens <[email protected]>
Reviewed by: Ilya Usvyatsky <[email protected]>
Approved by: Dan McDonald <[email protected]>

References:
  https://www.illumos.org/issues/4322
  illumos/illumos-gate@c50d56f

Ported by: Chris Dunlop <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes openzfs#1886
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants