Improved error handling for extreme rewinds #7921

behlendorf · 2018-09-17T19:37:21Z

Motivation and Context

It's possible to hit either of these two ASSERTs when importing a damaged
using the extreme rewind functionality. This should be handled gracefully
as was done in e927fc8. User are unlikely to encounter this failure mode,
but the ZTS tests do stress these call paths with debugging enabled.
See stacks below.

Description

The obsolete space map and obsolete counts objects may not be
accessible from the vdev's ZAP when it has been damaged. This may
be the case when performing an extreme rewind to import the pool.
It should be gracefully handled in the same way as e927fc8.

[ 6623.289194] VERIFY(err == 0 || err == 2) failed
[ 6623.293600] PANIC at vdev_indirect.c:889:vdev_obsolete_sm_object()
[ 6623.298872] Showing stack for process 7648
[ 6623.298876] CPU: 0 PID: 7648 Comm: zpool Tainted: P           OE    4.15.0-1007-aws #7-Ubuntu
[ 6623.298877] Hardware name: Xen HVM domU, BIOS 4.2.amazon 08/24/2006
[ 6623.298879] Call Trace:
[ 6623.298889]  dump_stack+0x63/0x8b
[ 6623.298902]  spl_dumpstack+0x29/0x30 [spl]
[ 6623.298909]  spl_panic+0xc8/0x110 [spl]
[ 6623.299128]  vdev_obsolete_sm_object+0xe8/0xf0 [zfs]
[ 6623.299202]  vdev_load+0x93/0x470 [zfs]
[ 6623.299355]  vdev_load+0x43/0x470 [zfs]
[ 6623.299588]  spa_ld_load_vdev_metadata+0xbb/0x12d [zfs]
[ 6623.299660]  spa_load_impl+0x177/0x3e0 [zfs]
[ 6623.299735]  spa_load+0x4e/0xf0 [zfs]
[ 6623.299807]  spa_tryimport+0x11c/0x550 [zfs]
[ 6623.299896]  zfs_ioc_pool_tryimport+0x64/0xc0 [zfs]
[ 6623.299979]  zfsdev_ioctl+0x5bf/0x690 [zfs]
[ 6623.299995]  do_vfs_ioctl+0xa8/0x630
[ 6623.300013]  SyS_ioctl+0x79/0x90
[ 6623.300023]  do_syscall_64+0x73/0x130

How Has This Been Tested?

Locally compiled, pending additions testing by the bot which do occasionally hit this.

http://build.zfsonlinux.org/builders/Ubuntu%2018.04%20x86_64%20%28TEST%29/builds/1186/steps/shell_9/logs/console

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Performance enhancement (non-breaking change which improves efficiency)
Code cleanup (non-breaking change which makes code smaller or more readable)
Breaking change (fix or feature that would cause existing functionality to change)
Documentation (a change to man pages or other documentation)

Checklist:

My code follows the ZFS on Linux code style requirements.
I have updated the documentation accordingly.
I have read the contributing document.
I have added tests to cover my changes.
All new and existing tests passed.
All commit messages are properly formatted and contain Signed-off-by.
Change has been approved by a ZFS on Linux member.

dweeezil

LGTM. How about using FTAG rather than the function name in the string?

[EDIT]: Actually, it looks like the first one would need s/vdev_obsolete_sm_objset/vdev_obsolete_sm_object/ anyway.

behlendorf · 2018-09-18T16:26:41Z

Good idea, that would have prevented exactly the typo I made. Refreshed and I updated the log message in vdev_checkpoint_sm_object as well.

sdimitro

LGTM thanks for looking into this Brian.

ahrens · 2018-09-18T23:48:08Z

module/zfs/vdev_indirect.c

- * Returns the spacemap object, or 0 if it wasn't in the ZAP or the ZAP doesn't
- * exist yet.
+ * Returns the spacemap object, or 0 if it wasn't in the ZAP,
+ * the ZAP doesn't exist yet, or the ZAP is damaged.


I don't think that this semantic change is safe. Have you examined all callers to ensure that they can safely ignore the presence of this object?

ahrens · 2018-09-18T23:50:03Z

module/zfs/vdev_indirect.c

+	if (err != 0 && err != ENOENT) {
+		vdev_dbgmsg(vd, "vdev_load: %s failed to retrieve obsolete "
+		    "counts from vdev ZAP [error=%d]", FTAG, err);
+		ASSERT3S(err, ==, ECKSUM);


I don't think that this semantic change is safe. Have you examined all callers to ensure that we can treat the counts as imprecise? It looks like spa_vdev_remove_cancel_sync() would not behave correctly - it would leak a ref on SPA_FEATURE_OBSOLETE_COUNTS (and leave the VDEV_TOP_ZAP_OBSOLETE_COUNTS_ARE_PRECISE lying around).

Semantically, other than a log message, we haven't actually changed anything for a production build. If the zap_lookup() were to fail, and it certainly can, the return value would be the same. If the callers can't already handle that, and I agree it looks like they can't, then these code paths have not been entirely safe since there were added. The same looks to be true for vdev_obsolete_sm_object() and its callers.

My feeling is the best way to handle this would be to rework the interface and callers to acknowledge that things aren't as simple as it does or doesn't exists. There are other possible legitimate but annoying failure scenarios.

The only other strictly safe option I can see would be to convert the ASSERTs to VERIFYs so at least we crash the system before any damage can be done. Which isn't great from a user perspective.

I should also mention that I only started looking at this because it is a case not infrequently hit by the CI, which runs at least one test that performance an extreme pool rewind.

You're right, this should be a VERIFY, not an ASSERT. Or as you said, we'd need to change this to be able to return an error and make the callers handle that.

I think whatever is decided, we should also change e927fc8, either on this review, or we can open a bug and I can make that change as a follow-on.

Let me see about reworking this to return an error and updating all the callers. I'd rather not be forced to disable the ZTS tests which can hit this.

This reverts commit e927fc8.

behlendorf · 2018-10-10T23:46:25Z

This is ready for another round of review. I've updated the interfaces to return an error when the zap_lookup() fails for a reason other than ENOENT. This made it straight forward to update all the callers since in the non-error case the basic logic didn't need to be changed. vdev_load() was updated to handle an error, but not the other callers where a failure was far less likely and difficult to handle.

This should resolve the crashes we've observed with the extreme rewind tests in the ZTS. Locally I ran the rewind tests 100 times and wasn't able to reproduce the original issue.

ahrens · 2018-10-11T15:50:31Z

include/sys/dmu.h

@@ -298,6 +298,7 @@ void zfs_znode_byteswap(void *buf, size_t size);
 #define	DMU_MAX_ACCESS (64 * 1024 * 1024) /* 64MB */
 #define	DMU_MAX_DELETEBLKCNT (20480) /* ~5MB of indirect blocks */

+#define	DMU_NO_OBJECT		0	/* no object id */


This is the same as DMU_META_DNODE_OBJECT, which could be confusing. I see that this is replacing ZFS_NO_OBJECT (which was confined to the ZPL at least), but it seems like maybe a bad idea to proliferate this.

module/zfs/spa.c

ahrens · 2018-10-11T15:54:04Z

module/zfs/vdev.c

 */
 int
-vdev_checkpoint_sm_object(vdev_t *vd)
+vdev_checkpoint_sm_object(vdev_t *vd, uint64_t *sm_obj)


Wow, the previous code was quite wrong to return an object number as an int. Seems like the compiler could have warned about implicitly throwing away the high bits.

My thoughts exactly when I saw this, I'm not sure why the compiler wasn't more vocal about this.

ahrens · 2018-10-11T15:57:12Z

module/zfs/vdev.c

 			 */
 			vd->vdev_stat.vs_checkpoint_space =
 			    -vd->vdev_checkpoint_sm->sm_alloc;
 			vd->vdev_spa->spa_checkpoint_info.sci_dspace +=
 			    vd->vdev_stat.vs_checkpoint_space;
+		} else if (error) {


error != 0

ahrens · 2018-10-11T15:57:36Z

module/zfs/vdev.c

@@ -3031,6 +3036,10 @@ vdev_load(vdev_t *vd)
 			return (error);
 		}
 		space_map_update(vd->vdev_obsolete_sm);
+	} else if (error) {


error != 0

ahrens · 2018-10-11T16:00:36Z

module/zfs/vdev_indirect.c

-	ASSERT(obsolete_sm_obj != 0);
+	uint64_t obsolete_sm_obj;
+	VERIFY0(vdev_obsolete_sm_object(vd, &obsolete_sm_obj));
+	ASSERT3U(obsolete_sm_obj, !=, 0);


Seems like we should either change 0 to DMU_NO_OBJECT in this context, or change vdev_obsolete_sm_object() to simply return 0.

I wasn't sure how to handle an error in this case so I opted for a VERIFY0 to be consistent with the zap_add and zap_remove operations down a few lines. They can technically fail for exactly the same reason, even though it's really unlikely post import. At the time it seemed preferable to ignoring the error which is how the current code effectively handles this.

ahrens · 2018-10-11T16:02:34Z

module/zfs/vdev_indirect.c

+/*
+ * Gets the obsolete count are precise spacemap object from the vdev's ZAP.
+ * On success are_precise will be set to reflect is the counts are precise.
+ * All other errors are returned to the caller.


... which all callers assert/verify is 0. So there's no behavior change but I guess this is ready for callers that can handle the error?

Right, that was my intent. This way the import path could be updated and the other callers could functionally remain unchanged. None of the other callers were set up to be able to do anything reasonable in case of an error, and I wanted to keep the change small.

ahrens · 2018-10-11T16:03:49Z

Thanks for doing this!

The vdev_checkpoint_sm_object(), vdev_obsolete_sm_object(), and vdev_obsolete_counts_are_precise() functions assume that the only way a zap_lookup() can fail is if the requested entry is missing. While this is the most common cause, it's not the only cause. Attemping to access a damaged ZAP will result in other errors. The most likely scenario for accessing a damaged ZAP is during an extreme rewind pool import. Under these conditions the pool is expected to contain damaged objects and the import code was updated to handle this gracefully. Getting an ECKSUM error from these ZAPs after the pool in import a far less likely, therefore the behavior for call paths was not modified. Signed-off-by: Brian Behlendorf <[email protected]>

behlendorf · 2018-10-11T20:36:58Z

Refreshed based on review feedback. I don't have any great ideas for how to handle a failure in the sync task so I left that as a VERIFY0. Better ideas are welcome.

Reverted ZFS_NO_OBJECT -> DMU_NO_OBJECT changes. That's out of scope the for this change so I decided to leave it as is.
vdev_indirect_state_sync_verify() switched to ASSERT0.
cstyle fixes.

codecov · 2018-10-12T07:00:25Z

Codecov Report

Merging #7921 into master will increase coverage by 11.95%.
The diff coverage is 82.35%.

@@             Coverage Diff             @@
##           master    #7921       +/-   ##
===========================================
+ Coverage   66.74%   78.69%   +11.95%     
===========================================
  Files         314      378       +64     
  Lines       97292   114240    +16948     
===========================================
+ Hits        64934    89900    +24966     
+ Misses      32358    24340     -8018

Flag	Coverage Δ
#kernel	`78.79% <74%> (?)`
#user	`68.08% <76.47%> (+1.34%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5d43cc9...5cb7417. Read the comment docs.

sdimitro

Thanks for doing this Brian, it's great that these errors are propagated now.

The vdev_checkpoint_sm_object(), vdev_obsolete_sm_object(), and vdev_obsolete_counts_are_precise() functions assume that the only way a zap_lookup() can fail is if the requested entry is missing. While this is the most common cause, it's not the only cause. Attemping to access a damaged ZAP will result in other errors. The most likely scenario for accessing a damaged ZAP is during an extreme rewind pool import. Under these conditions the pool is expected to contain damaged objects and the import code was updated to handle this gracefully. Getting an ECKSUM error from these ZAPs after the pool in import a far less likely, therefore the behavior for call paths was not modified. Reviewed-by: Tim Chase <[email protected]> Reviewed-by: Matthew Ahrens <[email protected]> Reviewed-by: Serapheim Dimitropoulos <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #7809 Closes #7921

This reverts commit e927fc8. Reviewed by: Tim Chase <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> Reviewed by: Serapheim Dimitropoulos <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes openzfs#7921

The vdev_checkpoint_sm_object(), vdev_obsolete_sm_object(), and vdev_obsolete_counts_are_precise() functions assume that the only way a zap_lookup() can fail is if the requested entry is missing. While this is the most common cause, it's not the only cause. Attemping to access a damaged ZAP will result in other errors. The most likely scenario for accessing a damaged ZAP is during an extreme rewind pool import. Under these conditions the pool is expected to contain damaged objects and the import code was updated to handle this gracefully. Getting an ECKSUM error from these ZAPs after the pool in import a far less likely, therefore the behavior for call paths was not modified. Reviewed-by: Tim Chase <[email protected]> Reviewed-by: Matthew Ahrens <[email protected]> Reviewed-by: Serapheim Dimitropoulos <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes openzfs#7809 Closes openzfs#7921

This reverts commit e927fc8. Reviewed by: Tim Chase <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> Reviewed by: Serapheim Dimitropoulos <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes openzfs#7921

The vdev_checkpoint_sm_object(), vdev_obsolete_sm_object(), and vdev_obsolete_counts_are_precise() functions assume that the only way a zap_lookup() can fail is if the requested entry is missing. While this is the most common cause, it's not the only cause. Attemping to access a damaged ZAP will result in other errors. The most likely scenario for accessing a damaged ZAP is during an extreme rewind pool import. Under these conditions the pool is expected to contain damaged objects and the import code was updated to handle this gracefully. Getting an ECKSUM error from these ZAPs after the pool in import a far less likely, therefore the behavior for call paths was not modified. Reviewed-by: Tim Chase <[email protected]> Reviewed-by: Matthew Ahrens <[email protected]> Reviewed-by: Serapheim Dimitropoulos <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes openzfs#7809 Closes openzfs#7921

behlendorf requested review from dweeezil and sdimitro September 17, 2018 19:37

dweeezil approved these changes Sep 18, 2018

View reviewed changes

behlendorf force-pushed the assert-vdev_obsolete_sm_object branch from 4a174ce to 53b9c51 Compare September 18, 2018 16:26

sdimitro approved these changes Sep 18, 2018

View reviewed changes

behlendorf added Reviewed and removed Reviewed labels Sep 18, 2018

ahrens requested changes Sep 18, 2018

View reviewed changes

behlendorf added Status: Code Review Needed Ready for review and testing Status: Work in Progress Not yet ready for general review and removed Status: Code Review Needed Ready for review and testing labels Sep 18, 2018

behlendorf mentioned this pull request Sep 26, 2018

zfs initialize performance enhancements #7955

Closed

13 tasks

Revert "Allow ECKSUM in vdev_checkpoint_sm_object()"

7b155ec

This reverts commit e927fc8.

behlendorf force-pushed the assert-vdev_obsolete_sm_object branch from 53b9c51 to ca5150b Compare October 10, 2018 22:43

behlendorf added Status: Code Review Needed Ready for review and testing and removed Status: Work in Progress Not yet ready for general review labels Oct 10, 2018

behlendorf changed the title ~~Allow ECKSUM in vdev_obsolete_*()~~ Improved error handling for extreme rewinds Oct 10, 2018

ahrens reviewed Oct 11, 2018

View reviewed changes

behlendorf force-pushed the assert-vdev_obsolete_sm_object branch from ca5150b to 5cb7417 Compare October 11, 2018 20:33

ahrens approved these changes Oct 12, 2018

View reviewed changes

sdimitro approved these changes Oct 12, 2018

View reviewed changes

behlendorf added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels Oct 12, 2018

behlendorf closed this in d6c7458 Oct 12, 2018

behlendorf deleted the assert-vdev_obsolete_sm_object branch April 19, 2021 19:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improved error handling for extreme rewinds #7921

Improved error handling for extreme rewinds #7921

behlendorf commented Sep 17, 2018 •

edited

Loading

dweeezil left a comment •

edited

Loading

behlendorf commented Sep 18, 2018

sdimitro left a comment

ahrens Sep 18, 2018

ahrens Sep 18, 2018

behlendorf Sep 19, 2018 •

edited

Loading

ahrens Sep 20, 2018

sdimitro Sep 20, 2018

behlendorf Sep 20, 2018

behlendorf commented Oct 10, 2018

ahrens Oct 11, 2018

ahrens Oct 11, 2018

behlendorf Oct 11, 2018

ahrens Oct 11, 2018

ahrens Oct 11, 2018

ahrens Oct 11, 2018

behlendorf Oct 11, 2018

ahrens Oct 11, 2018

behlendorf Oct 11, 2018

ahrens commented Oct 11, 2018

behlendorf commented Oct 11, 2018

codecov bot commented Oct 12, 2018 •

edited

Loading

sdimitro left a comment

Improved error handling for extreme rewinds #7921

Improved error handling for extreme rewinds #7921

Conversation

behlendorf commented Sep 17, 2018 • edited Loading

Motivation and Context

Description

How Has This Been Tested?

Types of changes

Checklist:

dweeezil left a comment • edited Loading

Choose a reason for hiding this comment

behlendorf commented Sep 18, 2018

sdimitro left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

behlendorf Sep 19, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

behlendorf commented Oct 10, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ahrens commented Oct 11, 2018

behlendorf commented Oct 11, 2018

codecov bot commented Oct 12, 2018 • edited Loading

Codecov Report

sdimitro left a comment

Choose a reason for hiding this comment

behlendorf commented Sep 17, 2018 •

edited

Loading

dweeezil left a comment •

edited

Loading

behlendorf Sep 19, 2018 •

edited

Loading

codecov bot commented Oct 12, 2018 •

edited

Loading