Skip to content

Commit

Permalink
Workaround for pools affected by openzfs#2094 erratum
Browse files Browse the repository at this point in the history
ZoL commit 1421c89 unintentionally changed the disk format in a forward-
compatible, but not backward compatible way. This was accomplished by
adding an entry to zbookmark_t, which is included in a couple of
on-disk structures. That lead to the creation of pools with incorrect
dsl_scan_phys_t objects that could only be imported by versions of ZoL
containing that commit.  Such pools cannot be imported by other versions
of ZFS or past versions of ZoL.

The additional field has been removed by the previous commit.  However,
affected pools must be imported and scrubbed using a version of ZoL with
this commit applied.  This will return the pools to a state in which they
may be imported by other implementations.

The 'zpool status' command can be used to determine if a pool is impacted.
A message similar to the following means your pool must be scrubbed to
restore compatibility by replacing the damaged dsl_scan_phys_t object.

  pool: zol-0.6.2-173
 state: ONLINE
  scan: pool compatibility issue detected.
   see: openzfs#2094
action: To correct the issue run 'zpool scrub'.
config:

	NAME                              STATE     READ WRITE CKSUM
	zol-0.6.2-173                     ONLINE       0     0     0
	  raidz1-0                        ONLINE       0     0     0
	    /var/tmp/zol-0.6.2-173/vdev0  ONLINE       0     0     0
	    /var/tmp/zol-0.6.2-173/vdev1  ONLINE       0     0     0
	    /var/tmp/zol-0.6.2-173/vdev2  ONLINE       0     0     0
	    /var/tmp/zol-0.6.2-173/vdev3  ONLINE       0     0     0

errors: No known data errors

Pools affected by the damaged dsl_scan_phys_t can be detected prior to
an upgrade by running the following command as root:

zdb -dddd poolname 1 | grep -P '^\t\tscan = ' | sed -e 's;scan = ;;' | wc -w

Note that `poolname` must be replaced with the name of the pool you wish
to check. A value of 25 indicates the dsl_scan_phys_t has been damaged.
A value of 24 indicates that the dsl_scan_phys_t is normal. A value of 0
indicates that there has never been a scrub run on the pool.

The regression caused by the change to zbookmark_t never made it into a
tagged release or any Gentoo backports. Only those using HEAD were
affected. However, this patch has a few limitations. There is no way to
detect a damaged dsl_scan_phys_t object when it has occurred on 32-bit
systems due to integer rounding that wrote incorrect information, but
avoided the overflow on them. Correcting such issues requires triggering
a scrub.

In addition, bptree_entry_phys_t objects are also affected. These
objects only exist during an asynchronous destroy and automating repair
of damaged bptree_entry_phys_t objects is non-trivial. Any pools that
have been imported by an affected version of ZoL must have all
asynchronous destroy operations finish before export and subsequent
import by a version containing this commit.  Failure to do that will
prevent pool import. The presence of any background destroys on any
imported pools can be checked by running `zpool get freeing` as root.
This will display a non-zero value for any pool with an active
asynchronous destroy.

Lastly, it is expected that no user data has been lost as a result of
this erratum.

Original-patch-by: Tim Chase <[email protected]>
Signed-off-by: Richard Yao <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Issue openzfs#2094
  • Loading branch information
ryao authored and behlendorf committed Feb 11, 2014
1 parent a798fa8 commit 7e35fd5
Show file tree
Hide file tree
Showing 6 changed files with 56 additions and 0 deletions.
13 changes: 13 additions & 0 deletions cmd/zpool/zpool_main.c
Original file line number Diff line number Diff line change
Expand Up @@ -3903,6 +3903,19 @@ print_scan_status(pool_scan_stat_t *ps)
return;
}

/*
* Scan required due to known errata.
*/
if ((ps->pss_pass_errata & DSE_ZOL_2094) &&
(ps->pss_state == DSS_FINISHED || ps->pss_state == DSS_CANCELED)) {
(void) printf(gettext("pool compatibility issue detected.\n"));
(void) printf(gettext(
" see: https://github.com/zfsonlinux/zfs/issues/2094\n"));

This comment has been minimized.

Copy link
@ryao

ryao Feb 11, 2014

Author

We might want to write a custom page to describe this, so users don't have to read through the history.

(void) printf(gettext(
"action: To correct the issue run 'zpool scrub'.\n"));
return;
}

start = ps->pss_start_time;
end = ps->pss_end_time;
zfs_nicenum(ps->pss_processed, processed_buf, sizeof (processed_buf));
Expand Down
2 changes: 2 additions & 0 deletions include/sys/dsl_scan.h
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,8 @@ typedef enum dsl_scan_flags {
DSF_VISIT_DS_AGAIN = 1<<0,
} dsl_scan_flags_t;

#define DSL_SCAN_FLAGS_MASK (DSF_VISIT_DS_AGAIN)

/*
* Every pool will have one dsl_scan_t and this structure will contain
* in-memory information about the scan and a pointer to the on-disk
Expand Down
3 changes: 3 additions & 0 deletions include/sys/fs/zfs.h
Original file line number Diff line number Diff line change
Expand Up @@ -694,6 +694,7 @@ typedef struct pool_scan_stat {
/* values not stored on disk */
uint64_t pss_pass_exam; /* examined bytes per scan pass */
uint64_t pss_pass_start; /* start time of a scan pass */
uint64_t pss_pass_errata; /* additional informational errata */
} pool_scan_stat_t;

typedef enum dsl_scan_state {
Expand All @@ -704,6 +705,8 @@ typedef enum dsl_scan_state {
DSS_NUM_STATES
} dsl_scan_state_t;

#define DSE_ZOL_2094 0x01 /* Zol Issue #2094 */
#define DSL_SCAN_ERRATA_MASK (DSE_ZOL_2094)

/*
* Vdev statistics. Note: all fields should be 64-bit because this
Expand Down
1 change: 1 addition & 0 deletions include/sys/spa_impl.h
Original file line number Diff line number Diff line change
Expand Up @@ -171,6 +171,7 @@ struct spa {
uint8_t spa_scrub_reopen; /* scrub doing vdev_reopen */
uint64_t spa_scan_pass_start; /* start time per pass/reboot */
uint64_t spa_scan_pass_exam; /* examined bytes per pass */
uint64_t spa_scan_pass_errata; /* errata issues detected */
kmutex_t spa_async_lock; /* protect async state */
kthread_t *spa_async_thread; /* thread doing async task */
int spa_async_suspended; /* async tasks suspended */
Expand Down
36 changes: 36 additions & 0 deletions module/zfs/dsl_scan.c
Original file line number Diff line number Diff line change
Expand Up @@ -123,6 +123,41 @@ dsl_scan_init(dsl_pool_t *dp, uint64_t txg)
err = zap_lookup(dp->dp_meta_objset, DMU_POOL_DIRECTORY_OBJECT,
DMU_POOL_SCAN, sizeof (uint64_t), SCAN_PHYS_NUMINTS,
&scn->scn_phys);
/*
* Detect if the pool contains the signature of #2094. If it
* does properly update the scn->scn_phys struct and notify the
* administrator via 'zpool status' to scrub the pool. In the
* unlikely event that an async destroy is in progress return
* an error. The destroy must be allowed to completely under
* the previous code and only then may be imported using the
* new code and the pool corrected with a 'zpool scrub'.
*
* See http://github.com/zfsonlinux/zfs/issue/2094
*/
if (err == EOVERFLOW) {
uint64_t zaptmp[SCAN_PHYS_NUMINTS + 1];
VERIFY3S(SCAN_PHYS_NUMINTS, ==, 24);
VERIFY3S(offsetof(dsl_scan_phys_t, scn_flags), ==,
(23 * sizeof (uint64_t)));

err = zap_lookup(dp->dp_meta_objset,
DMU_POOL_DIRECTORY_OBJECT, DMU_POOL_SCAN,
sizeof (uint64_t), SCAN_PHYS_NUMINTS + 1, &zaptmp);
if (err == 0) {
uint64_t overflow = zaptmp[SCAN_PHYS_NUMINTS];

if (overflow & ~DSL_SCAN_FLAGS_MASK ||
scn->scn_async_destroying)
return (EOVERFLOW);

bcopy(zaptmp, &scn->scn_phys,
SCAN_PHYS_NUMINTS * sizeof (uint64_t));
scn->scn_phys.scn_flags = overflow;
spa->spa_scan_pass_errata |= DSE_ZOL_2094;
}
/* Fall-thru to the expected error handling */

This comment has been minimized.

Copy link
@ryao

ryao Feb 11, 2014

Author

This contains a problem that I fixed in my local branch, but failed to push on Thursday because I was pursuing the automated scrub @behlendorf suggested that I try implementing. In particular, the error handling here is meant for the first zap_lookup(), so a non-zero result in the second zap_lookup() should immediately return EOVERFLOW. Consequently, fall-thru error handling is not necessary because exiting this block means that there was no error after adjusting for the erratum.

This comment has been minimized.

Copy link
@behlendorf

behlendorf Feb 12, 2014

Owner

This was done intentionally to preserve the error code in the unlikely event the second zap_lookup() fails. If it were to happen I wanted to ensure we invoked the expected error handling paths all the way up the stack for this error code.

}

if (err == ENOENT)
return (0);
else if (err)
Expand Down Expand Up @@ -319,6 +354,7 @@ dsl_scan_done(dsl_scan_t *scn, boolean_t complete, dmu_tx_t *tx)
}

scn->scn_phys.scn_end_time = gethrestime_sec();
spa->spa_scan_pass_errata &= ~DSE_ZOL_2094;
}

/* ARGSUSED */
Expand Down
1 change: 1 addition & 0 deletions module/zfs/spa_misc.c
Original file line number Diff line number Diff line change
Expand Up @@ -1795,6 +1795,7 @@ spa_scan_get_stats(spa_t *spa, pool_scan_stat_t *ps)
/* data not stored on disk */
ps->pss_pass_start = spa->spa_scan_pass_start;
ps->pss_pass_exam = spa->spa_scan_pass_exam;
ps->pss_pass_errata = spa->spa_scan_pass_errata;

return (0);
}
Expand Down

0 comments on commit 7e35fd5

Please sign in to comment.