Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved error handling for extreme rewinds #7921

Closed

Conversation

behlendorf
Copy link
Contributor

@behlendorf behlendorf commented Sep 17, 2018

Motivation and Context

It's possible to hit either of these two ASSERTs when importing a damaged
using the extreme rewind functionality. This should be handled gracefully
as was done in e927fc8. User are unlikely to encounter this failure mode,
but the ZTS tests do stress these call paths with debugging enabled.
See stacks below.

Description

The obsolete space map and obsolete counts objects may not be
accessible from the vdev's ZAP when it has been damaged. This may
be the case when performing an extreme rewind to import the pool.
It should be gracefully handled in the same way as e927fc8.

[ 6623.289194] VERIFY(err == 0 || err == 2) failed
[ 6623.293600] PANIC at vdev_indirect.c:889:vdev_obsolete_sm_object()
[ 6623.298872] Showing stack for process 7648
[ 6623.298876] CPU: 0 PID: 7648 Comm: zpool Tainted: P           OE    4.15.0-1007-aws #7-Ubuntu
[ 6623.298877] Hardware name: Xen HVM domU, BIOS 4.2.amazon 08/24/2006
[ 6623.298879] Call Trace:
[ 6623.298889]  dump_stack+0x63/0x8b
[ 6623.298902]  spl_dumpstack+0x29/0x30 [spl]
[ 6623.298909]  spl_panic+0xc8/0x110 [spl]
[ 6623.299128]  vdev_obsolete_sm_object+0xe8/0xf0 [zfs]
[ 6623.299202]  vdev_load+0x93/0x470 [zfs]
[ 6623.299355]  vdev_load+0x43/0x470 [zfs]
[ 6623.299588]  spa_ld_load_vdev_metadata+0xbb/0x12d [zfs]
[ 6623.299660]  spa_load_impl+0x177/0x3e0 [zfs]
[ 6623.299735]  spa_load+0x4e/0xf0 [zfs]
[ 6623.299807]  spa_tryimport+0x11c/0x550 [zfs]
[ 6623.299896]  zfs_ioc_pool_tryimport+0x64/0xc0 [zfs]
[ 6623.299979]  zfsdev_ioctl+0x5bf/0x690 [zfs]
[ 6623.299995]  do_vfs_ioctl+0xa8/0x630
[ 6623.300013]  SyS_ioctl+0x79/0x90
[ 6623.300023]  do_syscall_64+0x73/0x130

How Has This Been Tested?

Locally compiled, pending additions testing by the bot which do occasionally hit this.

http://build.zfsonlinux.org/builders/Ubuntu%2018.04%20x86_64%20%28TEST%29/builds/1186/steps/shell_9/logs/console

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation (a change to man pages or other documentation)

Checklist:

  • My code follows the ZFS on Linux code style requirements.
  • I have updated the documentation accordingly.
  • I have read the contributing document.
  • I have added tests to cover my changes.
  • All new and existing tests passed.
  • All commit messages are properly formatted and contain Signed-off-by.
  • Change has been approved by a ZFS on Linux member.

Copy link
Contributor

@dweeezil dweeezil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. How about using FTAG rather than the function name in the string?

[EDIT]: Actually, it looks like the first one would need s/vdev_obsolete_sm_objset/vdev_obsolete_sm_object/ anyway.

@behlendorf
Copy link
Contributor Author

Good idea, that would have prevented exactly the typo I made. Refreshed and I updated the log message in vdev_checkpoint_sm_object as well.

@behlendorf behlendorf force-pushed the assert-vdev_obsolete_sm_object branch from 4a174ce to 53b9c51 Compare September 18, 2018 16:26
Copy link
Contributor

@sdimitro sdimitro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM thanks for looking into this Brian.

* Returns the spacemap object, or 0 if it wasn't in the ZAP or the ZAP doesn't
* exist yet.
* Returns the spacemap object, or 0 if it wasn't in the ZAP,
* the ZAP doesn't exist yet, or the ZAP is damaged.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that this semantic change is safe. Have you examined all callers to ensure that they can safely ignore the presence of this object?

if (err != 0 && err != ENOENT) {
vdev_dbgmsg(vd, "vdev_load: %s failed to retrieve obsolete "
"counts from vdev ZAP [error=%d]", FTAG, err);
ASSERT3S(err, ==, ECKSUM);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that this semantic change is safe. Have you examined all callers to ensure that we can treat the counts as imprecise? It looks like spa_vdev_remove_cancel_sync() would not behave correctly - it would leak a ref on SPA_FEATURE_OBSOLETE_COUNTS (and leave the VDEV_TOP_ZAP_OBSOLETE_COUNTS_ARE_PRECISE lying around).

Copy link
Contributor Author

@behlendorf behlendorf Sep 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Semantically, other than a log message, we haven't actually changed anything for a production build. If the zap_lookup() were to fail, and it certainly can, the return value would be the same. If the callers can't already handle that, and I agree it looks like they can't, then these code paths have not been entirely safe since there were added. The same looks to be true for vdev_obsolete_sm_object() and its callers.

My feeling is the best way to handle this would be to rework the interface and callers to acknowledge that things aren't as simple as it does or doesn't exists. There are other possible legitimate but annoying failure scenarios.

The only other strictly safe option I can see would be to convert the ASSERTs to VERIFYs so at least we crash the system before any damage can be done. Which isn't great from a user perspective.

I should also mention that I only started looking at this because it is a case not infrequently hit by the CI, which runs at least one test that performance an extreme pool rewind.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, this should be a VERIFY, not an ASSERT. Or as you said, we'd need to change this to be able to return an error and make the callers handle that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think whatever is decided, we should also change e927fc8, either on this review, or we can open a bug and I can make that change as a follow-on.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me see about reworking this to return an error and updating all the callers. I'd rather not be forced to disable the ZTS tests which can hit this.

@behlendorf behlendorf added Status: Code Review Needed Ready for review and testing Status: Work in Progress Not yet ready for general review and removed Status: Code Review Needed Ready for review and testing labels Sep 18, 2018
@behlendorf behlendorf force-pushed the assert-vdev_obsolete_sm_object branch from 53b9c51 to ca5150b Compare October 10, 2018 22:43
@behlendorf behlendorf added Status: Code Review Needed Ready for review and testing and removed Status: Work in Progress Not yet ready for general review labels Oct 10, 2018
@behlendorf behlendorf changed the title Allow ECKSUM in vdev_obsolete_*() Improved error handling for extreme rewinds Oct 10, 2018
@behlendorf
Copy link
Contributor Author

This is ready for another round of review. I've updated the interfaces to return an error when the zap_lookup() fails for a reason other than ENOENT. This made it straight forward to update all the callers since in the non-error case the basic logic didn't need to be changed. vdev_load() was updated to handle an error, but not the other callers where a failure was far less likely and difficult to handle.

This should resolve the crashes we've observed with the extreme rewind tests in the ZTS. Locally I ran the rewind tests 100 times and wasn't able to reproduce the original issue.

@@ -298,6 +298,7 @@ void zfs_znode_byteswap(void *buf, size_t size);
#define DMU_MAX_ACCESS (64 * 1024 * 1024) /* 64MB */
#define DMU_MAX_DELETEBLKCNT (20480) /* ~5MB of indirect blocks */

#define DMU_NO_OBJECT 0 /* no object id */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the same as DMU_META_DNODE_OBJECT, which could be confusing. I see that this is replacing ZFS_NO_OBJECT (which was confined to the ZPL at least), but it seems like maybe a bad idea to proliferate this.

*/
int
vdev_checkpoint_sm_object(vdev_t *vd)
vdev_checkpoint_sm_object(vdev_t *vd, uint64_t *sm_obj)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, the previous code was quite wrong to return an object number as an int. Seems like the compiler could have warned about implicitly throwing away the high bits.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thoughts exactly when I saw this, I'm not sure why the compiler wasn't more vocal about this.

*/
vd->vdev_stat.vs_checkpoint_space =
-vd->vdev_checkpoint_sm->sm_alloc;
vd->vdev_spa->spa_checkpoint_info.sci_dspace +=
vd->vdev_stat.vs_checkpoint_space;
} else if (error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

error != 0

@@ -3031,6 +3036,10 @@ vdev_load(vdev_t *vd)
return (error);
}
space_map_update(vd->vdev_obsolete_sm);
} else if (error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

error != 0

ASSERT(obsolete_sm_obj != 0);
uint64_t obsolete_sm_obj;
VERIFY0(vdev_obsolete_sm_object(vd, &obsolete_sm_obj));
ASSERT3U(obsolete_sm_obj, !=, 0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like we should either change 0 to DMU_NO_OBJECT in this context, or change vdev_obsolete_sm_object() to simply return 0.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't sure how to handle an error in this case so I opted for a VERIFY0 to be consistent with the zap_add and zap_remove operations down a few lines. They can technically fail for exactly the same reason, even though it's really unlikely post import. At the time it seemed preferable to ignoring the error which is how the current code effectively handles this.

/*
* Gets the obsolete count are precise spacemap object from the vdev's ZAP.
* On success are_precise will be set to reflect is the counts are precise.
* All other errors are returned to the caller.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... which all callers assert/verify is 0. So there's no behavior change but I guess this is ready for callers that can handle the error?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, that was my intent. This way the import path could be updated and the other callers could functionally remain unchanged. None of the other callers were set up to be able to do anything reasonable in case of an error, and I wanted to keep the change small.

@ahrens
Copy link
Member

ahrens commented Oct 11, 2018

Thanks for doing this!

The vdev_checkpoint_sm_object(), vdev_obsolete_sm_object(), and
vdev_obsolete_counts_are_precise() functions assume that the
only way a zap_lookup() can fail is if the requested entry is
missing.  While this is the most common cause, it's not the only
cause.  Attemping to access a damaged ZAP will result in other
errors.

The most likely scenario for accessing a damaged ZAP is during
an extreme rewind pool import.  Under these conditions the pool
is expected to contain damaged objects and the import code was
updated to handle this gracefully.  Getting an ECKSUM error from
these ZAPs after the pool in import a far less likely, therefore
the behavior for call paths was not modified.

Signed-off-by: Brian Behlendorf <[email protected]>
@behlendorf behlendorf force-pushed the assert-vdev_obsolete_sm_object branch from ca5150b to 5cb7417 Compare October 11, 2018 20:33
@behlendorf
Copy link
Contributor Author

Refreshed based on review feedback. I don't have any great ideas for how to handle a failure in the sync task so I left that as a VERIFY0. Better ideas are welcome.

  • Reverted ZFS_NO_OBJECT -> DMU_NO_OBJECT changes. That's out of scope the for this change so I decided to leave it as is.
  • vdev_indirect_state_sync_verify() switched to ASSERT0.
  • cstyle fixes.

@codecov
Copy link

codecov bot commented Oct 12, 2018

Codecov Report

Merging #7921 into master will increase coverage by 11.95%.
The diff coverage is 82.35%.

Impacted file tree graph

@@             Coverage Diff             @@
##           master    #7921       +/-   ##
===========================================
+ Coverage   66.74%   78.69%   +11.95%     
===========================================
  Files         314      378       +64     
  Lines       97292   114240    +16948     
===========================================
+ Hits        64934    89900    +24966     
+ Misses      32358    24340     -8018
Flag Coverage Δ
#kernel 78.79% <74%> (?)
#user 68.08% <76.47%> (+1.34%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5d43cc9...5cb7417. Read the comment docs.

Copy link
Contributor

@sdimitro sdimitro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this Brian, it's great that these errors are propagated now.

@behlendorf behlendorf added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels Oct 12, 2018
behlendorf added a commit that referenced this pull request Oct 12, 2018
The vdev_checkpoint_sm_object(), vdev_obsolete_sm_object(), and
vdev_obsolete_counts_are_precise() functions assume that the
only way a zap_lookup() can fail is if the requested entry is
missing.  While this is the most common cause, it's not the only
cause.  Attemping to access a damaged ZAP will result in other
errors.

The most likely scenario for accessing a damaged ZAP is during
an extreme rewind pool import.  Under these conditions the pool
is expected to contain damaged objects and the import code was
updated to handle this gracefully.  Getting an ECKSUM error from
these ZAPs after the pool in import a far less likely, therefore
the behavior for call paths was not modified.

Reviewed-by: Tim Chase <[email protected]>
Reviewed-by: Matthew Ahrens <[email protected]>
Reviewed-by: Serapheim Dimitropoulos <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #7809
Closes #7921
ghfields pushed a commit to ghfields/zfs that referenced this pull request Oct 29, 2018
This reverts commit e927fc8.

Reviewed by: Tim Chase <[email protected]>
Reviewed by: Matthew Ahrens <[email protected]>
Reviewed by: Serapheim Dimitropoulos <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes openzfs#7921
ghfields pushed a commit to ghfields/zfs that referenced this pull request Oct 29, 2018
The vdev_checkpoint_sm_object(), vdev_obsolete_sm_object(), and
vdev_obsolete_counts_are_precise() functions assume that the
only way a zap_lookup() can fail is if the requested entry is
missing.  While this is the most common cause, it's not the only
cause.  Attemping to access a damaged ZAP will result in other
errors.

The most likely scenario for accessing a damaged ZAP is during
an extreme rewind pool import.  Under these conditions the pool
is expected to contain damaged objects and the import code was
updated to handle this gracefully.  Getting an ECKSUM error from
these ZAPs after the pool in import a far less likely, therefore
the behavior for call paths was not modified.

Reviewed-by: Tim Chase <[email protected]>
Reviewed-by: Matthew Ahrens <[email protected]>
Reviewed-by: Serapheim Dimitropoulos <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes openzfs#7809
Closes openzfs#7921
GregorKopka pushed a commit to GregorKopka/zfs that referenced this pull request Jan 7, 2019
This reverts commit e927fc8.

Reviewed by: Tim Chase <[email protected]>
Reviewed by: Matthew Ahrens <[email protected]>
Reviewed by: Serapheim Dimitropoulos <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes openzfs#7921
GregorKopka pushed a commit to GregorKopka/zfs that referenced this pull request Jan 7, 2019
The vdev_checkpoint_sm_object(), vdev_obsolete_sm_object(), and
vdev_obsolete_counts_are_precise() functions assume that the
only way a zap_lookup() can fail is if the requested entry is
missing.  While this is the most common cause, it's not the only
cause.  Attemping to access a damaged ZAP will result in other
errors.

The most likely scenario for accessing a damaged ZAP is during
an extreme rewind pool import.  Under these conditions the pool
is expected to contain damaged objects and the import code was
updated to handle this gracefully.  Getting an ECKSUM error from
these ZAPs after the pool in import a far less likely, therefore
the behavior for call paths was not modified.

Reviewed-by: Tim Chase <[email protected]>
Reviewed-by: Matthew Ahrens <[email protected]>
Reviewed-by: Serapheim Dimitropoulos <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes openzfs#7809
Closes openzfs#7921
@behlendorf behlendorf deleted the assert-vdev_obsolete_sm_object branch April 19, 2021 19:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Accepted Ready to integrate (reviewed, tested)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants