Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Long hold the dataset during upgrade #6837

Merged
merged 1 commit into from
Nov 10, 2017

Conversation

ab-oe
Copy link
Contributor

@ab-oe ab-oe commented Nov 7, 2017

Description

Added long hold of dataset during whole upgrade process. The receive and rollback will return an EBUSY error until the upgrade is not finished.

Motivation and Context

If the receive or rollback is performed while filesystem is upgrading the object set may be evicted in dsl_dataset_clone_swap_sync_impl. This will lead to NULL pointer dereference when upgrade tries to access evicted object set. Solves #5295.

How Has This Been Tested?

Tested in following scenario:

  • Created two pools without features enabled and one dataset for each pool (one for source and one for destination).
  • Created snapshots on source dataset every 15s and send them to the destination. I used the znapzend for testing purposes. It executes send and receive with following parameters:
    zfs send -I src/vol00 autosnap_2017-11-07-080400 src/vol00 autosnap_2017-11-07-080415 | zfs recv -F dst/vol00
  • Wrote about 10GB of small files to source dataset to make sure that z_upgrade will run during execution of zfs recv and zfs rollback
  • Upgraded destination zpool.
  • Remounted the destination dataset (kicked z_upgrade to do its job).
  • The zfs recv commands failed with a busy error.
  • The zfs rollback command failed with a busy error.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation (a change to man pages or other documentation)

Checklist:

  • My code follows the ZFS on Linux code style requirements.
  • I have updated the documentation accordingly.
  • I have read the CONTRIBUTING document.
  • I have added tests to cover my changes.
  • All new and existing tests passed.
  • All commit messages are properly formatted and contain Signed-off-by.
  • Change has been approved by a ZFS on Linux member.

@@ -1320,10 +1320,19 @@ dmu_objset_upgrade_task_cb(void *data)
static void
dmu_objset_upgrade(objset_t *os, dmu_objset_upgrade_cb_t cb)
{
boolean_t config_held = B_FALSE;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than add this conditional logic to dmu_objset_upgrade() it looks like it might be cleaner to ASSERT that the caller has the dsl pool config log held. That would allow you to unconditionally take the long hold as you originally intended. Doing so should be straight forward in the zfs_ioc_userobjspace_upgrade and dmu_objset_own callers by moving down the dsl_pool_rele call a little bit. As for the call in zfs_fuid_overobjquota() you should be able to safely take and drop the pool config lock in the upgrade case unconditionally here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, I'll fix this.

@codecov
Copy link

codecov bot commented Nov 8, 2017

Codecov Report

Merging #6837 into master will decrease coverage by 0.05%.
The diff coverage is 90.9%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #6837      +/-   ##
==========================================
- Coverage   75.19%   75.14%   -0.06%     
==========================================
  Files         297      297              
  Lines       94453    94450       -3     
==========================================
- Hits        71026    70970      -56     
- Misses      23427    23480      +53
Flag Coverage Δ
#kernel 74.51% <90.9%> (-0.17%) ⬇️
#user 67.43% <16.66%> (-0.02%) ⬇️
Impacted Files Coverage Δ
module/zfs/zfs_vfsops.c 78.19% <100%> (+0.17%) ⬆️
module/zfs/zfs_ioctl.c 82.94% <100%> (ø) ⬆️
module/zfs/dmu_objset.c 90.24% <83.33%> (-0.05%) ⬇️
module/zfs/zio.c 87.72% <0%> (-5.86%) ⬇️
module/zfs/spa_errlog.c 89.23% <0%> (-3.85%) ⬇️
lib/libzpool/kernel.c 66.82% <0%> (-3.18%) ⬇️
cmd/zed/agents/zfs_retire.c 77.01% <0%> (-2.49%) ⬇️
module/zfs/dsl_scan.c 81.2% <0%> (-1.67%) ⬇️
module/zfs/zrlock.c 82.81% <0%> (-1.57%) ⬇️
module/zfs/mmp.c 95.7% <0%> (-1.23%) ⬇️
... and 42 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 62df1bc...4e6ac3c. Read the comment docs.

@ab-oe ab-oe force-pushed the upgrade_race_condition branch from 75a8546 to f716f03 Compare November 8, 2017 07:51
Copy link
Contributor

@behlendorf behlendorf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, the patch looks good. Just one remaining minor issue to wrap up.

@@ -1311,6 +1313,7 @@ dmu_objset_upgrade_task_cb(void *data)
os->os_upgrade_exit = B_TRUE;
os->os_upgrade_id = 0;
mutex_exit(&os->os_upgrade_lock);
dsl_dataset_long_rele(dmu_objset_ds(os), upgrade_tag);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's one case in dmu_objset_upgrade_stop() when you need to drop the long hold. When taskq_cancel_id() return's 0 it indicates the taskqid was able to be be successfully canceled before it was executed by a thread. In which case you'll need to drop the long hold there. In both the ENOENT and EBUSY error cases the callback will either have run or be currently running so there's no problem there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, I'll fix this.

If the receive or rollback is performed while filesystem is upgrading
the objset may be evicted in `dsl_dataset_clone_swap_sync_impl`. This
will lead to NULL pointer dereference when upgrade tries to access
evicted objset.

This commit adds long hold of dataset during whole upgrade process.
The receive and rollback will return an EBUSY error until the
upgrade is not finished.

Signed-off-by: Arkadiusz Bubała <[email protected]>
@ab-oe ab-oe force-pushed the upgrade_race_condition branch from f716f03 to 4e6ac3c Compare November 9, 2017 07:22
@behlendorf behlendorf merged commit c0daec3 into openzfs:master Nov 10, 2017
tonyhutter pushed a commit to tonyhutter/zfs that referenced this pull request Nov 21, 2017
If the receive or rollback is performed while filesystem is upgrading
the objset may be evicted in `dsl_dataset_clone_swap_sync_impl`. This
will lead to NULL pointer dereference when upgrade tries to access
evicted objset.

This commit adds long hold of dataset during whole upgrade process.
The receive and rollback will return an EBUSY error until the
upgrade is not finished.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Arkadiusz Bubała <[email protected]>
Closes openzfs#5295
Closes openzfs#6837
Nasf-Fan pushed a commit to Nasf-Fan/zfs that referenced this pull request Jan 29, 2018
If the receive or rollback is performed while filesystem is upgrading
the objset may be evicted in `dsl_dataset_clone_swap_sync_impl`. This
will lead to NULL pointer dereference when upgrade tries to access
evicted objset.

This commit adds long hold of dataset during whole upgrade process.
The receive and rollback will return an EBUSY error until the
upgrade is not finished.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Arkadiusz Bubała <[email protected]>
Closes openzfs#5295
Closes openzfs#6837
Nasf-Fan pushed a commit to Nasf-Fan/zfs that referenced this pull request Feb 13, 2018
If the receive or rollback is performed while filesystem is upgrading
the objset may be evicted in `dsl_dataset_clone_swap_sync_impl`. This
will lead to NULL pointer dereference when upgrade tries to access
evicted objset.

This commit adds long hold of dataset during whole upgrade process.
The receive and rollback will return an EBUSY error until the
upgrade is not finished.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Arkadiusz Bubała <[email protected]>
Closes openzfs#5295
Closes openzfs#6837
FransUrbo pushed a commit to FransUrbo/zfs that referenced this pull request Apr 28, 2019
If the receive or rollback is performed while filesystem is upgrading
the objset may be evicted in `dsl_dataset_clone_swap_sync_impl`. This
will lead to NULL pointer dereference when upgrade tries to access
evicted objset.

This commit adds long hold of dataset during whole upgrade process.
The receive and rollback will return an EBUSY error until the
upgrade is not finished.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Arkadiusz Bubała <[email protected]>
Closes openzfs#5295 
Closes openzfs#6837
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants