Skip to content

Commit

Permalink
WIP issue 2094.
Browse files Browse the repository at this point in the history
Commit 1421c89 expanded the size of a zbookmark_t from 24 to 25 integers
which causes any pool that has been resilvered subseqently to the patch
to have a "scan" entry in the object directory with 25 integers and to
become un-importable by ZFS implementations expecting only 24 integers.

This patch is a work-in-progress first cut at a fix.  It does two things:
first, it creates a new structure, zbookmark_phys_t which represents the
desired on-disk format of the bookmark within the directory's "scan"
element.  Second, it relaxes the size check during import to alow an
EOVERFLOW which would otherwise cause the import to fail.

A "broken" pool can be fixed by importing and then re-resilvering
(scrub, etc.).
  • Loading branch information
dweeezil committed Feb 1, 2014
1 parent 8b46464 commit c5223f5
Show file tree
Hide file tree
Showing 4 changed files with 30 additions and 5 deletions.
2 changes: 1 addition & 1 deletion include/sys/dsl_scan.h
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@ typedef struct dsl_scan_phys {
uint64_t scn_errors; /* scan I/O error count */
uint64_t scn_ddt_class_max;
ddt_bookmark_t scn_ddt_bookmark;
zbookmark_t scn_bookmark;
zbookmark_phys_t scn_bookmark;
uint64_t scn_flags; /* dsl_scan_flags_t */
} dsl_scan_phys_t;

Expand Down
1 change: 1 addition & 0 deletions include/sys/spa.h
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,7 @@ typedef struct spa_aux_vdev spa_aux_vdev_t;
typedef struct ddt ddt_t;
typedef struct ddt_entry ddt_entry_t;
typedef struct zbookmark zbookmark_t;
typedef struct zbookmark_phys zbookmark_phys_t;

struct dsl_pool;
struct dsl_dataset;
Expand Down
23 changes: 23 additions & 0 deletions include/sys/zio.h
Original file line number Diff line number Diff line change
Expand Up @@ -255,6 +255,13 @@ extern const char *zio_type_name[ZIO_TYPES];
* Therefore it must not change size or alignment between 32/64 bit
* compilation options.
*/
struct zbookmark_phys {
uint64_t zb_objset;
uint64_t zb_object;
int64_t zb_level;
uint64_t zb_blkid;
};

struct zbookmark {
uint64_t zb_objset;
uint64_t zb_object;
Expand All @@ -272,6 +279,14 @@ struct zbookmark {
(zb)->zb_func = FTAG; \
}

#define SET_BOOKMARK_PHYS(zb, objset, object, level, blkid) \
{ \
(zb)->zb_objset = objset; \
(zb)->zb_object = object; \
(zb)->zb_level = level; \
(zb)->zb_blkid = blkid; \
}

#define ZB_DESTROYED_OBJSET (-1ULL)

#define ZB_ROOT_OBJECT (0ULL)
Expand Down Expand Up @@ -588,6 +603,14 @@ extern void spa_handle_ignored_writes(spa_t *spa);
/* zbookmark functions */
boolean_t zbookmark_is_before(const struct dnode_phys *dnp,
const zbookmark_t *zb1, const zbookmark_t *zb2);
static inline boolean_t zbookmark_is_before_phys(const struct dnode_phys *dnp,
const zbookmark_t *zb1, const zbookmark_phys_t *zb2)
{
zbookmark_t zb2a;
SET_BOOKMARK_PHYS(&zb2a, zb2->zb_objset, zb2->zb_object,
zb2->zb_level, zb2->zb_blkid);
return zbookmark_is_before(dnp, zb1, &zb2a);
}

#ifdef __cplusplus
}
Expand Down
9 changes: 5 additions & 4 deletions module/zfs/dsl_scan.c
Original file line number Diff line number Diff line change
Expand Up @@ -123,7 +123,7 @@ dsl_scan_init(dsl_pool_t *dp, uint64_t txg)
err = zap_lookup(dp->dp_meta_objset, DMU_POOL_DIRECTORY_OBJECT,
DMU_POOL_SCAN, sizeof (uint64_t), SCAN_PHYS_NUMINTS,
&scn->scn_phys);
if (err == ENOENT)
if (err == ENOENT || err == EOVERFLOW)
return (0);
else if (err)
return (err);
Expand Down Expand Up @@ -421,7 +421,8 @@ dsl_scan_check_pause(dsl_scan_t *scn, const zbookmark_t *zb)
(longlong_t)zb->zb_object,
(longlong_t)zb->zb_level,
(longlong_t)zb->zb_blkid);
scn->scn_phys.scn_bookmark = *zb;
SET_BOOKMARK_PHYS(&scn->scn_phys.scn_bookmark, zb->zb_objset,
zb->zb_object, zb->zb_level, zb->zb_blkid);
}
dprintf("pausing at DDT bookmark %llx/%llx/%llx/%llx\n",
(longlong_t)scn->scn_phys.scn_ddt_bookmark.ddb_class,
Expand Down Expand Up @@ -558,7 +559,7 @@ dsl_scan_check_resume(dsl_scan_t *scn, const dnode_phys_t *dnp,
* If we already visited this bp & everything below (in
* a prior txg sync), don't bother doing it again.
*/
if (zbookmark_is_before(dnp, zb, &scn->scn_phys.scn_bookmark))
if (zbookmark_is_before_phys(dnp, zb, &scn->scn_phys.scn_bookmark))
return (B_TRUE);

/*
Expand Down Expand Up @@ -823,7 +824,7 @@ dsl_scan_ds_destroyed(dsl_dataset_t *ds, dmu_tx_t *tx)
(u_longlong_t)ds->ds_phys->ds_next_snap_obj);
scn->scn_phys.scn_flags |= DSF_VISIT_DS_AGAIN;
} else {
SET_BOOKMARK(&scn->scn_phys.scn_bookmark,
SET_BOOKMARK_PHYS(&scn->scn_phys.scn_bookmark,
ZB_DESTROYED_OBJSET, 0, 0, 0);
zfs_dbgmsg("destroying ds %llu; currently traversing; "
"reset bookmark to -1,0,0,0",
Expand Down

15 comments on commit c5223f5

@FransUrbo
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

been resilvered subseqently to the patch - does this mean: Created before the patch and then resilvered after the patch?

@dweeezil
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I wasn't more clear, I was pressed for time when I wrote the message. What I meant to say is that pools are effectively broken if they were resilvered after the referenced commit which expanded the size of the bookmark structure.

@FransUrbo
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries. Does this also occurs if there is no resilver, only scrub(s)?

@dweeezil
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Same thing. Anything that touches 'scan'.

@FransUrbo
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, thanx for the clarifications.

Tried this commit, but it didn't help me unfortunately. I think my pool is further messed up. I get SPLErrors in the zfeature.c:spa_feature_is_active() function. So I had to hack it to return 0 if feature is async_destroy or empty_bpobj. I'm currently only interested in mounting it read only at this time, so this is ok.

@behlendorf
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dweeezil I'm glad this was caught. When Prakash and I put together that original patch it didn't occur to us that the zbookmark_t was an on disk structure. As you know most/all of the other on-disk structures include phys as part of their name. I'll take a close look at this one Monday, this needs to be a blocker for 0.6.3. We can't allow changes to the on-disk structure outside the feature flag framework.

@ryao
Copy link

@ryao ryao commented on c5223f5 Feb 2, 2014

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dweeezil The phys_t suffix implies an on-disk structure. It would probably be best to rename the existing zbookmark_t to zbookmark_phys_t and then use a different name for what you are calling zbookmark_phys_t.

@behlendorf I agree completely. I am inclined to suggest that we revert the problem patch and bring it back into HEAD as a feature flag. However, we will need a solution to fix pools that have been potentially rendered unimportable by other implementations now that this has made it into HEAD. Doing that will require more than simply reverting, which would leave a number of people stranded.

@ryao
Copy link

@ryao ryao commented on c5223f5 Feb 2, 2014

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@behlendorf Once this is fixed, it would be great if the LLNL test farm would try importing pools created by new commits with older ZoL versions to confirm backward compatibility. A few simple file creations, a snapshot and a scrub should be adequate to catch most issues in this category.

@dweeezil
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ryao I've been thinking about this during the day (kind of rushed things earlier when I was pressed for time) and think that the proper solution would be to replace the ddt_bookmark_t and zbookmark_t structs in the dsl_scan_phys_t structure with their members. That would reinforce the notion that dsl_scan_phys is an on-disk structure but that the bookmark-related structures aren't. Then the rather small bits of code that manage the dsl_scan_phys_t structure should be re-worked to pack/unpack these members into their ddt_bookmark_t and zbookmark_t structs as necessary.

I'll re-work this patch in a bit to use this technique which seem to me should obviate the need for both a feature flag and for any new structs at all.

@ryao
Copy link

@ryao ryao commented on c5223f5 Feb 2, 2014

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dweeezil That sounds good. However, once that is written, we will need to take it a step further by developing some way to repair pools that have been damaged by this unintended disk-format change. This has been in HEAD long enough that the repercussions of failing to repair the damaged pools would be detrimental to the ZoL project's credibility. I imagine that we will have a more in-depth discussion about this when @behlendorf is available on Monday.

With that said, I have multiple offline matters that require my attention tomorrow, so I will not be able to do much until Monday myself. Whatever you do with this before Monday will likely be the only work done on this until then.

@dweeezil
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm working on a patch that removes the zbookmark_t struct from dsl_scan_phys_t. The ddt_bookmark_t could also cause the same problem if it were changed but removing that struct is a pain in the butt.

@dweeezil
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just uploaded 10fe651 which has ONLY BEEN COMPILE TESTED. I'll be testing it tomorrow.

@dweeezil
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 10fe651 commit needed further refinement which I'm testing. I'll be posting the next version as a pull request in order to get it into the issue tracker.

@pyavdr
Copy link

@pyavdr pyavdr commented on c5223f5 Feb 2, 2014

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested this on a testpool, and yes i can import a repaired pool with 0.6.1. Looks ok. Are there more sideeffects on this issue?

@dweeezil
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pyavdr Make sure you're using 2168d69 (from openzfs#2097) rather than one of the older ones. My second patch, 10fe651 didn't fully implement the compatibility shim (and would actually overwrite memory). You can use zdb -dddd tank 1 | grep scan | sed -e 's;scan = ;;' | wc -w (replace "tank" with your pool name) to tell whether your pool is broken. It should print 24. Broken pools will print 25.

Please sign in to comment.