Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Port Illumos #3464 (3b2aab1) - zfs synctask code needs restructuring #1495

Closed
dweeezil opened this issue Jun 3, 2013 · 11 comments
Closed

Port Illumos #3464 (3b2aab1) - zfs synctask code needs restructuring #1495

dweeezil opened this issue Jun 3, 2013 · 11 comments
Labels
Type: Feature Feature request or new feature
Milestone

Comments

@dweeezil
Copy link
Contributor

dweeezil commented Jun 3, 2013

I came across Illumos #3464 wihle working on Illumos #2882 and will try merging it. It looks like another good one to get into ZoL. Here's the summary from the Illumos issue:

The locking code around dsl_datasets and dsl_dir's is responsible
for several deadlocks and race conditions, most prominently:

- a concurrent "zfs list" while running "zfs destroy" can cause
  the destroy to fail with EBUSY.  bug #3041 deadlock between
  dp->dp_config_rwlock and ds->ds_opening_lock

To fix this, we must restructure the locking, and change the
interface to synctasks.

The wad that will fix this also addresses the following:

- improve performance of "zfs recv" by reducing the number of calls
  to txg_wait_synced()
- fix zfs refquota can be violated by "zfs rollback" and "zfs receive"
  operations
- fix ztest fails in ztest_dmu_snapshot_hold due to bad printf string
- remove unused zc_top_ds from zfs_cmd_t
- fix zdb crashes on pools < SPA_VERSION_DEADLISTS
- remove undocumented "zfs hold -t" flag (for temporary hold)
- use ASSERT0 / VERIFY0 in more places
- new libzfs_core routines: lzc_hold(), lzc_release(), lzc_get_holds()
  (for user holds)
- move some routines from dsl_dataset.c to dsl_destroy.c,
  dsl_userhold.c
- make rrwlock.c work in userland (libzpool.so)
- change all synctask consumers to new model
- remove ds_recvlock; all receives use "%recv" child to prevent
  concurrent receives
- "zfs destroy" destroys multiple snapshots at once, so it is much faster
@dweeezil
Copy link
Contributor Author

dweeezil commented Jun 8, 2013

I'm going to put this one on hold for a bit. There are a handful of Illumos issues beginning with 3006 which should be merged first in order to prevent a cascade of merging problems. To make matters more complicated, those interim Illumos commits really should be based on my pull request #1492.

The next important Illumos issue to pull will be 3006 which adds the ASSERT0 and VERIFY0 macros. Of course, that one is mostly cosmetic but getting it pulled will greatly reduce the effort of future merges.

I'm going to work on pull requests for these important missing Illumos commits. I'm contemplating as to whether I should bother basing them on current master code or whether to base them on the branch referred to by my pull request #1492.

Once a handful of these are done, the Illumos 3464 (zfs synctask code needs restructuring) mega-patch should be a lot more straightforward.

@ryao
Copy link
Contributor

ryao commented Jul 3, 2013

Illumos 3464 also depends on Illumos 3098. It is a small patch that is easy to merge.

@dweeezil
Copy link
Contributor Author

dweeezil commented Jul 3, 2013

@ryao Thanks for catching that. I've still not had a chance to work on this one yet. I wanted to give the patch in #1492 a bit of a workout first and then this one would layer on top of that.

@ryao
Copy link
Contributor

ryao commented Jul 5, 2013

@dweeezil I am actually playing with that patch on my desktop now. It works rather nicely. Putting 3464 on top of it requires addressing numerous merge conflicts though.

@dweeezil
Copy link
Contributor Author

Quick update on my work on the Illumos 3464 patch: I've got it compiling and initially working. Other than testing, one thing I still need to do is to re-work minor creation because it depended on the dmu_objset_find_spa() function which has been deprecated in favor of the new dmu_objset_fund_dp() function.

@ryao
Copy link
Contributor

ryao commented Aug 13, 2013

@dweeezil I wish I had read this sooner. I just duplicated most of your work, although my version of the patch does not work yet. Mounting datasets hangs. I likely can fix it, but it might be best to combine our efforts. You can see what I have here:

ryao/zfs@master...illumos-merge

Note that there are many patches unrelated to 3464 in that. I just pushed my local branch to GIT before I did an initial test.

Speaking of which, some of the changes in your 2882 port are actually from 3464. I reverted a few of them in that branch, although I have not yet made a proper patch to fully remove the 3464 changes from 2882. In addition, it looks like we can resolve the dmu_objset_find_spa() removal by adopting the FreeBSD version of zvol_create_minors(). That is currently included in the 3464 patch in my branch and should be split out.

@ryao
Copy link
Contributor

ryao commented Aug 14, 2013

I have updated the branch. "Revert #3464 changes that crept into #2882 commit" properly removes the 3464 code from 2882. I have also properly broken out the zvol code from FreeBSD into its own commit. I have not yet tackled the dataset mount hang that I observed.

@dweeezil
Copy link
Contributor Author

Pushed dweeezil/zfs@9fc3119 to rebase on the latest Illumos 2882 work and, therefore, a current master. This is still in testing right now. I did however just run a quick zconfig.sh test on it and it passes all tests.

@dweeezil
Copy link
Contributor Author

@ryao As of dweeezil/zfs@9fc3119, my Illumos 3464 patch set passes all zconfig.sh and ztest tests. I've not yet collapsed all my individual commits into a single one yet. I'm going to deploy this on an almost-production server today and work it out for awhile.

@dweeezil
Copy link
Contributor Author

Initial testing on an almost-production server shows me that there are indeed some KM_PUSHPAGE fixes necessary (at least in the snapshot destruction path).

@dweeezil
Copy link
Contributor Author

The nvlist allocations have been fixed. I'm going to make a pull request now.

unya pushed a commit to unya/zfs that referenced this issue Dec 13, 2013
3464 zfs synctask code needs restructuring
Reviewed by: Dan Kimmel <[email protected]>
Reviewed by: Adam Leventhal <[email protected]>
Reviewed by: George Wilson <[email protected]>
Reviewed by: Christopher Siden <[email protected]>
Approved by: Garrett D'Amore <[email protected]>

References:
  https://www.illumos.org/issues/3464
  illumos/illumos-gate@3b2aab1

Ported-by: Tim Chase <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes openzfs#1495
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Feature Feature request or new feature
Projects
None yet
Development

No branches or pull requests

2 participants