-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improved error handling for extreme rewinds #7921
Improved error handling for extreme rewinds #7921
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. How about using FTAG
rather than the function name in the string?
[EDIT]: Actually, it looks like the first one would need s/vdev_obsolete_sm_objset/vdev_obsolete_sm_object/ anyway.
Good idea, that would have prevented exactly the typo I made. Refreshed and I updated the log message in |
4a174ce
to
53b9c51
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM thanks for looking into this Brian.
module/zfs/vdev_indirect.c
Outdated
* Returns the spacemap object, or 0 if it wasn't in the ZAP or the ZAP doesn't | ||
* exist yet. | ||
* Returns the spacemap object, or 0 if it wasn't in the ZAP, | ||
* the ZAP doesn't exist yet, or the ZAP is damaged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that this semantic change is safe. Have you examined all callers to ensure that they can safely ignore the presence of this object?
module/zfs/vdev_indirect.c
Outdated
if (err != 0 && err != ENOENT) { | ||
vdev_dbgmsg(vd, "vdev_load: %s failed to retrieve obsolete " | ||
"counts from vdev ZAP [error=%d]", FTAG, err); | ||
ASSERT3S(err, ==, ECKSUM); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that this semantic change is safe. Have you examined all callers to ensure that we can treat the counts as imprecise? It looks like spa_vdev_remove_cancel_sync() would not behave correctly - it would leak a ref on SPA_FEATURE_OBSOLETE_COUNTS (and leave the VDEV_TOP_ZAP_OBSOLETE_COUNTS_ARE_PRECISE lying around).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Semantically, other than a log message, we haven't actually changed anything for a production build. If the zap_lookup()
were to fail, and it certainly can, the return value would be the same. If the callers can't already handle that, and I agree it looks like they can't, then these code paths have not been entirely safe since there were added. The same looks to be true for vdev_obsolete_sm_object()
and its callers.
My feeling is the best way to handle this would be to rework the interface and callers to acknowledge that things aren't as simple as it does or doesn't exists. There are other possible legitimate but annoying failure scenarios.
The only other strictly safe option I can see would be to convert the ASSERTs to VERIFYs so at least we crash the system before any damage can be done. Which isn't great from a user perspective.
I should also mention that I only started looking at this because it is a case not infrequently hit by the CI, which runs at least one test that performance an extreme pool rewind.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, this should be a VERIFY, not an ASSERT. Or as you said, we'd need to change this to be able to return an error and make the callers handle that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think whatever is decided, we should also change e927fc8, either on this review, or we can open a bug and I can make that change as a follow-on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me see about reworking this to return an error and updating all the callers. I'd rather not be forced to disable the ZTS tests which can hit this.
This reverts commit e927fc8.
53b9c51
to
ca5150b
Compare
This is ready for another round of review. I've updated the interfaces to return an error when the This should resolve the crashes we've observed with the extreme rewind tests in the ZTS. Locally I ran the rewind tests 100 times and wasn't able to reproduce the original issue. |
include/sys/dmu.h
Outdated
@@ -298,6 +298,7 @@ void zfs_znode_byteswap(void *buf, size_t size); | |||
#define DMU_MAX_ACCESS (64 * 1024 * 1024) /* 64MB */ | |||
#define DMU_MAX_DELETEBLKCNT (20480) /* ~5MB of indirect blocks */ | |||
|
|||
#define DMU_NO_OBJECT 0 /* no object id */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the same as DMU_META_DNODE_OBJECT, which could be confusing. I see that this is replacing ZFS_NO_OBJECT (which was confined to the ZPL at least), but it seems like maybe a bad idea to proliferate this.
*/ | ||
int | ||
vdev_checkpoint_sm_object(vdev_t *vd) | ||
vdev_checkpoint_sm_object(vdev_t *vd, uint64_t *sm_obj) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, the previous code was quite wrong to return an object number as an int
. Seems like the compiler could have warned about implicitly throwing away the high bits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My thoughts exactly when I saw this, I'm not sure why the compiler wasn't more vocal about this.
module/zfs/vdev.c
Outdated
*/ | ||
vd->vdev_stat.vs_checkpoint_space = | ||
-vd->vdev_checkpoint_sm->sm_alloc; | ||
vd->vdev_spa->spa_checkpoint_info.sci_dspace += | ||
vd->vdev_stat.vs_checkpoint_space; | ||
} else if (error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
error != 0
module/zfs/vdev.c
Outdated
@@ -3031,6 +3036,10 @@ vdev_load(vdev_t *vd) | |||
return (error); | |||
} | |||
space_map_update(vd->vdev_obsolete_sm); | |||
} else if (error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
error != 0
ASSERT(obsolete_sm_obj != 0); | ||
uint64_t obsolete_sm_obj; | ||
VERIFY0(vdev_obsolete_sm_object(vd, &obsolete_sm_obj)); | ||
ASSERT3U(obsolete_sm_obj, !=, 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like we should either change 0
to DMU_NO_OBJECT
in this context, or change vdev_obsolete_sm_object() to simply return 0
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't sure how to handle an error in this case so I opted for a VERIFY0
to be consistent with the zap_add
and zap_remove
operations down a few lines. They can technically fail for exactly the same reason, even though it's really unlikely post import. At the time it seemed preferable to ignoring the error which is how the current code effectively handles this.
/* | ||
* Gets the obsolete count are precise spacemap object from the vdev's ZAP. | ||
* On success are_precise will be set to reflect is the counts are precise. | ||
* All other errors are returned to the caller. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... which all callers assert/verify is 0. So there's no behavior change but I guess this is ready for callers that can handle the error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, that was my intent. This way the import path could be updated and the other callers could functionally remain unchanged. None of the other callers were set up to be able to do anything reasonable in case of an error, and I wanted to keep the change small.
Thanks for doing this! |
The vdev_checkpoint_sm_object(), vdev_obsolete_sm_object(), and vdev_obsolete_counts_are_precise() functions assume that the only way a zap_lookup() can fail is if the requested entry is missing. While this is the most common cause, it's not the only cause. Attemping to access a damaged ZAP will result in other errors. The most likely scenario for accessing a damaged ZAP is during an extreme rewind pool import. Under these conditions the pool is expected to contain damaged objects and the import code was updated to handle this gracefully. Getting an ECKSUM error from these ZAPs after the pool in import a far less likely, therefore the behavior for call paths was not modified. Signed-off-by: Brian Behlendorf <[email protected]>
ca5150b
to
5cb7417
Compare
Refreshed based on review feedback. I don't have any great ideas for how to handle a failure in the sync task so I left that as a
|
Codecov Report
@@ Coverage Diff @@
## master #7921 +/- ##
===========================================
+ Coverage 66.74% 78.69% +11.95%
===========================================
Files 314 378 +64
Lines 97292 114240 +16948
===========================================
+ Hits 64934 89900 +24966
+ Misses 32358 24340 -8018
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for doing this Brian, it's great that these errors are propagated now.
The vdev_checkpoint_sm_object(), vdev_obsolete_sm_object(), and vdev_obsolete_counts_are_precise() functions assume that the only way a zap_lookup() can fail is if the requested entry is missing. While this is the most common cause, it's not the only cause. Attemping to access a damaged ZAP will result in other errors. The most likely scenario for accessing a damaged ZAP is during an extreme rewind pool import. Under these conditions the pool is expected to contain damaged objects and the import code was updated to handle this gracefully. Getting an ECKSUM error from these ZAPs after the pool in import a far less likely, therefore the behavior for call paths was not modified. Reviewed-by: Tim Chase <[email protected]> Reviewed-by: Matthew Ahrens <[email protected]> Reviewed-by: Serapheim Dimitropoulos <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #7809 Closes #7921
This reverts commit e927fc8. Reviewed by: Tim Chase <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> Reviewed by: Serapheim Dimitropoulos <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes openzfs#7921
The vdev_checkpoint_sm_object(), vdev_obsolete_sm_object(), and vdev_obsolete_counts_are_precise() functions assume that the only way a zap_lookup() can fail is if the requested entry is missing. While this is the most common cause, it's not the only cause. Attemping to access a damaged ZAP will result in other errors. The most likely scenario for accessing a damaged ZAP is during an extreme rewind pool import. Under these conditions the pool is expected to contain damaged objects and the import code was updated to handle this gracefully. Getting an ECKSUM error from these ZAPs after the pool in import a far less likely, therefore the behavior for call paths was not modified. Reviewed-by: Tim Chase <[email protected]> Reviewed-by: Matthew Ahrens <[email protected]> Reviewed-by: Serapheim Dimitropoulos <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes openzfs#7809 Closes openzfs#7921
This reverts commit e927fc8. Reviewed by: Tim Chase <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> Reviewed by: Serapheim Dimitropoulos <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes openzfs#7921
The vdev_checkpoint_sm_object(), vdev_obsolete_sm_object(), and vdev_obsolete_counts_are_precise() functions assume that the only way a zap_lookup() can fail is if the requested entry is missing. While this is the most common cause, it's not the only cause. Attemping to access a damaged ZAP will result in other errors. The most likely scenario for accessing a damaged ZAP is during an extreme rewind pool import. Under these conditions the pool is expected to contain damaged objects and the import code was updated to handle this gracefully. Getting an ECKSUM error from these ZAPs after the pool in import a far less likely, therefore the behavior for call paths was not modified. Reviewed-by: Tim Chase <[email protected]> Reviewed-by: Matthew Ahrens <[email protected]> Reviewed-by: Serapheim Dimitropoulos <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes openzfs#7809 Closes openzfs#7921
Motivation and Context
It's possible to hit either of these two ASSERTs when importing a damaged
using the extreme rewind functionality. This should be handled gracefully
as was done in e927fc8. User are unlikely to encounter this failure mode,
but the ZTS tests do stress these call paths with debugging enabled.
See stacks below.
Description
The obsolete space map and obsolete counts objects may not be
accessible from the vdev's ZAP when it has been damaged. This may
be the case when performing an extreme rewind to import the pool.
It should be gracefully handled in the same way as e927fc8.
How Has This Been Tested?
Locally compiled, pending additions testing by the bot which do occasionally hit this.
http://build.zfsonlinux.org/builders/Ubuntu%2018.04%20x86_64%20%28TEST%29/builds/1186/steps/shell_9/logs/console
Types of changes
Checklist:
Signed-off-by
.