Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Full Issue 2094 Fix #2112

Closed
wants to merge 5 commits into from
Closed

Full Issue 2094 Fix #2112

wants to merge 5 commits into from

Conversation

behlendorf
Copy link
Contributor

An additional test script to ensure 'zpool import' regressions are caught prior to those changes being merged.

@behlendorf behlendorf added this to the 0.6.3 milestone Feb 7, 2014
@behlendorf behlendorf added the Bug label Feb 7, 2014
@dweeezil
Copy link
Contributor

dweeezil commented Feb 7, 2014

I like this concept quite and this is definitely a step in the right direction as far as regression testing is concerned.

Given recent experience, however, it seems it would be a good idea to eventually have reference pools that represent the myriad of optional states in which a pool can exist. It would be nice to document all of them (another issue altogether), but some of the ones that come to mind are (in no particular order):

  • Empty vs. non-empty DMU_OT_UNLINKED_SET per file system
  • State of volatile feature (flags) such as async_destroy
  • Presence/absence of other optional items in the directory object of the MOS; all the DMU_POOL_... stuff such as "scan", etc.
  • Deprecated features such as old-style in-progress scans
  • Pools created from different endian and/or bit-width systems

@behlendorf
Copy link
Contributor Author

@dweeezil I couldn't agree more. The more reference images we have in different states the better. We do some of this testing today with ztest just not across different versions of the code base.

@behlendorf
Copy link
Contributor Author

I've refreshed this patch stack.

The reference images have been moved to a separate https://github.com/zfsonlinux/zfs-images repository to avoid bloating the core zfs repository. The new repository has been added as a git submodule under scripts/zfs-images. The git submodule isn't required for anything but running the zimport.sh test script and won't be checked out by default so it should have very little impact. However, it does making pulling in the images easy if you need them and they can be fetched as a tarball using the following github URL:

https://github.com/zfsonlinux/zfs-images/tarball/master

@behlendorf
Copy link
Contributor Author

@dweeezil @ryao I've refreshed this patch stack to include the full set of proposed changes to address #2094. It obsoletes #2109 and I'd appreciate your feedback and any additional testing you can offer. To summarize the changes which are largely based on you guys works.

  • Revert the zbookmark_t change. Long term it would be nice to preserve this functionality but I can't see an easy way of doing it without significant code change. That's something I want to avoid for now so it can wait.
  • Patch to automatically detect the issue and prompt users via 'zpool status' that they must scrub their pool. The async destroy is NOT handled aside from causing the import to fail and I believe we agreed that's the way we want it.
  • Add a zimport.sh script which I'll add to my automated testing to verify a variety of pools import using old ZoL tags, master branch, and the master branch plus whatever commit is being tested. I've verified this script would have caught this regression and prevented it from being merged.

@behlendorf behlendorf mentioned this pull request Feb 11, 2014
@@ -171,6 +171,7 @@ struct spa {
uint8_t spa_scrub_reopen; /* scrub doing vdev_reopen */
uint64_t spa_scan_pass_start; /* start time per pass/reboot */
uint64_t spa_scan_pass_exam; /* examined bytes per pass */
uint64_t spa_scan_pass_errata; /* errata issues detected */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To ensure that this is meant for reuse, we probably should pick a more general name for it and split it out into a separate commit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good thought. I suspect we may end up preserving this functionality just in case we ever need it again. I'll rename the variable to spa_errata and refactor the commit when I update this branch.

@ryao
Copy link
Contributor

ryao commented Feb 11, 2014

@behlendorf I agree with @dweeezil's points. Also, I have read the C code in depth and skimmed through the shell scripts; and added some comments. I think this should be fine to merge after the one about the fall-thru handling is resolved. My other comments on the code are less important, although I hope that you will consider them.

@dweeezil
Copy link
Contributor

I finally got a chance to look at this (behlendorf/zfs@7e35fd5). Here are my comments:

  1. The error handling in dsl_scan_init() looks to be just OK since the second zap_lookup() ought to never return ENOENT (since the first one has already returned EOVERFLOW) but it might be a bit clearer to return right away should it fail. Also, it would be nice if we could return something more specific in the scn_async_destroying case that would propagate back to the user as something else other than the error they get when EOVERFLOW is returned. I've not looked into whether this is possible.
  2. I do agree that the "errata flag" in the spa might be better named something more generic like, say "spa_errata" should it need to be kept around and potentially used for other non scan-related things.
  3. As to the test script, I only skimmed it and my only comment is to wonder whether all distros contain "curl" or whether there ought to be a helper function that tries curl, wget, etc. and then errors if none exist (or is this all something that's handled during the autoconf?).

I've not had a chance to actually try running with this commit yet but hopefully I'll get a chance tomorrow (Wednesday) during the day.

@dweeezil
Copy link
Contributor

@behlendorf I was able to test your patch and it works perfectly well, however, I realized it would be nice for a plain "zpool import" to be able to report the erratum as well. To that end, I've worked up a slightly different approach to reporting the error in dweeezil/zfs@f115c41 which layers on your patch. It adds a new pool status which allows for the erratum to be reported in a more "natural" way within the zpool command.

@behlendorf
Copy link
Contributor Author

@dweeezil @ryao I've refreshed the patch stack to incorporate your feedback. The major changes are as follows.

  • You're absolutely right @dweeezil, there's existing infrastructure we should be using for generating import/status messages. I've squashed your changes in to this patch and took them a step farther.
  • The spa errata infrastructure was made more generic and at the same time consistent with the libzfs_status.c reporting. It is designed such that the worst errata detected (if any) should be set in spa->spa_errata using the zpool_errata_t type. This errata identifier is then attached to the pool configuration for import/status to consume.
  • Adding the errata to the pool config ensures we remain user/kernel space compatible between different versions of ZoL. Older versions of the utilities will newer kernel modules will be ignorant of the errata which is safe. Newer versions of the utilities with older kernel modules will cleanly handle the optional errata entry in the pool config.
  • The ASYNC_DESTROY case is now cleanly handled with a unique errata identifier and a useful error message is printed by 'zpool import' when this case is detected.
  • TODO - Based on @ryao's suggestion the reference link was updated to be consistent with all the others. However, I still haven't had a chance to write an errata page for the website. @ryao it would be very helpful if you could throw something together and open a pull request against the website.
  • TODO - I've done some basic testing of this but not as much as I normally would because I'm very short on time and wanted to post the refreshed changes for comment. Related to this I'd like to verify the ASYNC_DESTROY case works as expected but I haven't yet constructed a test pool.
  • As for the test script I'm less concerned about getting every detail of that right the first time. It's primarily a development and test tool and we can refine it as needed going forward. My main goal is to get something in the tree with this change so I can begin using it for automated testing. I fully expect that we'll refine it going forward.
  • @ryao I looked in to splitting this in to two commits as you suggested but I really wasn't happy with the result. I think the change is more understandable if it all goes in at once in this case, and it's not particularly large.

@dweeezil
Copy link
Contributor

@behlendorf I visually reviewed the patch and it looks good to me. I'm going run it on one of my test systems this evening and plan on (trying to) make test pools with an in-progress async destroy and one with an in-progress scrub. This should give us a good foundation going forward should any other such incompatibilities come up.

@dweeezil
Copy link
Contributor

@behlendorf Quick status update since I obviously didn't get to this yesterday: I've managed to make a test pool with a bptree_obj in it. For some reason, the patch isn't detecting it and to make matters work, I bump into an assert the next time I try to destroy a file system (somewhat as expected because the contents of the object are bogus). More later when I get a chance to look at it a bit closer.

@behlendorf
Copy link
Contributor Author

@dweeezil If you could open a pull request with the test image against https://github.com/zfsonlinux/zfs-images I can dig in to this as well. I've just started carefully testing this patch as well, and I'm keen to finalize these changes and move on.

@behlendorf
Copy link
Contributor Author

I'm putting together the new errata page, I should have something in the next day or so.

@dweeezil
Copy link
Contributor

@behlendorf I was in a bit of a hurry before and forgot that we can't detect the problem if there's only a bptree_obj but no scan (wonder if there would be a reasonable way to do that). In any case, if you want to take a peek at my bogus pool, it's at http://images.chase2k.com/tank.bptree.xz. It has only a bptree_obj but no scan. I'll make one with both a scan and a bptree_obj tomorrow and I'm sure that'll work.

@dweeezil
Copy link
Contributor

@behlendorf I'll be making a pull request against https://github.com/zfsonlinux/zfs-images with some bptree_obj entries. I'm going to delete the one referenced in my last message.

@ryao
Copy link
Contributor

ryao commented Feb 18, 2014

@behlendorf My root pool is damaged and I have not yet diagnosed the failure mode. I am out of action while I address that. I wanted to use Illumos to perform repairs, but my pool is affected by this erratum, so I reached out to the Illumos developers in #illumos on freenode for some assistance in getting an Illumos build with this correction. Consequently, igork in #illumos agreed to help me and went through the process of building a dilos ISO with the erratum fix applied. This process was very informative and I have some remarks.

In particular, I think we should split the erratum fix into three patches. The first would print a notice to dmesg. The second would add the add the erratum reporting infrastructure. The third would hook the erratum fix into the erratum reporting infrastructure. We both have written non-trivial portions of this patch and it is not clear who wrote what to external parties. Splitting it would ensure that attribution is clear. Additionally, a split would make it easier for other Open ZFS implementations to grab the errata fix without having to go through the reporting infrastructure changes, which break the JNI bindings.

@olw2005
Copy link

olw2005 commented Feb 18, 2014

@behlendorf Any chance of the "Revert changes to zbookmark_t" portion of this fix getting applied to main? Would help prevent further spread of the issue. I've been manually applying it on the systems I'm using to test other fixes but for the unwary it's a potential gotcha.

@FransUrbo
Copy link
Contributor

@olw2005 It's on it's way. As soon as they're happy with it (i.e., being sure it doesn't introduce yet more problems). Expect it in a few days...

@ryao
Copy link
Contributor

ryao commented Feb 18, 2014

@olw2005 If it is applied without the rest of this, everyone affected who updates to HEAD will be unable to use their pools. That is something we would like to avoid.

@olw2005
Copy link

olw2005 commented Feb 18, 2014

@ryao @FransUrbo facepalm Of course, that makes perfect sense. At least I learned more about "advanced git operations" in the process so it's all good. =)

@behlendorf
Copy link
Contributor Author

@ryao How about two patches. The latest incarnation of the patch doesn't include any logging to dmesg and I don't believe there's any need for it. The issue is clearly described through the output of the 'zpool import' / 'zpool status' and the errata webpages I'll be posting (hopefully I can finish that up soon).

I have no objection to slipping up the rest of the change as you described and generally agree that's a cleaner way to go. I'll get the patches refreshed over the next couple of days to split the patch stack.

@ryao @dweeezil Aside from this minor cleanup are both of you OK with merging this fix?

@dweeezil
Copy link
Contributor

@behlendorf Yes, I'm fine with the fix as it stands.

As a side note, if anyone felt so inclined, I realized it would be possible to somewhat hack-ishly detect the "incompatible bptree_obj with no scan" situation by dmu_read()-ing the bptree_phys_t entries and then reading one more. If the extra chunk had any nonzero bytes, the erratum is present. The number of entries to read is available in the bonus as bp_end-bt_begin in a a bptree_phys_t struct.

behlendorf and others added 5 commits February 20, 2014 21:18
Verify that an assortment of known good reference pools can be imported
using different versions of the ZoL code.

By default references pools for the major ZFS implementation will be
checked against the most recent ZoL tags and the master development branch.
Alternate tags or branches may be verified with the '-s <src-tag> option.
Passing the keyword "installed" will instruct the script to test whatever
version is installed.

Preferentially a reference pool is used for all tests.  However, if one
does not exist and the pool-tag matches one of the src-tags then a new
reference pool will be created using binaries from that source build.
This is particularly useful when you need to test your changes before
opening a pull request.

New reference pools may be added by placing a bzip2 compressed tarball
of the pool in the scripts/zpool-example directory and then passing
the -p <pool-tag> option.  To increase the test coverage reference pools
should be collected for all the major ZFS implementations.  Having these
pools easily available is also helpful to the developers.

Care should be taken to run these tests with a kernel supported by all
the listed tags.  Otherwise build failure will cause false positives.

EXAMPLES:

The following example will verify the zfs-0.6.2 tag, the master branch,
and the installed zfs version can correctly import the listed pools.
Note there is no reference pool available for master and installed but
because binaries are available one is automatically constructed.  The
working directory is also preserved between runs (-k) preventing the
need to rebuild from source for multiple runs.

 zimport.sh -k -f /var/tmp/zimport \
     -s "zfs-0.6.1 zfs-0.6.2 master installed" \
     -p "all master installed"

--------------------- ZFS on Linux Source Versions --------------
                zfs-0.6.1       zfs-0.6.2       master          0.6.2-180
-----------------------------------------------------------------
Clone SPL       Skip		Skip		Skip		Skip
Clone ZFS       Skip		Skip		Skip		Skip
Build SPL       Skip		Skip		Skip		Skip
Build ZFS       Skip		Skip		Skip		Skip
-----------------------------------------------------------------
zevo-1.1.1      Pass		Pass		Pass		Pass
zol-0.6.1       Pass		Pass		Pass		Pass
zol-0.6.2-173   Fail		Fail		Pass		Pass
zol-0.6.2       Pass		Pass		Pass		Pass
master          Fail		Fail		Pass		Pass
installed       Pass		Pass		Pass		Pass

Signed-off-by: Brian Behlendorf <[email protected]>
Signed-off-by: Tim Chase <[email protected]>
Signed-off-by: Richard Yao <[email protected]>
Issue openzfs#2094
Commit 1421c89 added a field to
zbookmark_t that unintentinoally caused a disk format change. This
negatively affected backward compatibility and platform portability.
Therefore, this field is being removed.

The function that field permitted is left unimplemented until a later
patch that will reimplement the field in a way that does not affect the
disk format.

Signed-off-by: Richard Yao <[email protected]>
Signed-off-by: Tim Chase <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Issue openzfs#2094
Both the zpool_import_status() and zpool_get_status() functions
return the zpool_status_t enum.  This explicit type should be used
rather than the more generic int type.

This patch makes no functional change and should only be considered
code cleanup.  It happens to have been done in the context of openzfs#2094
because that's when I noticed this issue.

Signed-off-by: Brian Behlendorf <[email protected]>
Signed-off-by: Tim Chase <[email protected]>
Signed-off-by: Richard Yao <[email protected]
Issue openzfs#2094
From time to time it may be nessisary to inform the pool administator
about an errata which impacts their pool.  These errata will by shown
to the administrator through the 'zpool status' and 'zpool import'
output as appropriate.  The errata must clearly describe the issue
detected, how the pool is impacted, and what action should be taken
to resolve the situation.  Additional information for each errata will
be provided at http://zfsonlinux.org/msg/ZFS-8000-ER.

To accomplish the above this patch adds the required infrastructure to
allow the kernel modules to notify the utilities that an errata has
been detected.  This is done through the ZPOOL_CONFIG_ERRATA uint64_t
which has been added to the pool configuration nvlist.

To add a new errata the following changes must be made:

* A new errata identifier must be assigned by adding a new enum value
  to the zpool_errata_t type.  New enums must be added to the end to
  preserve the existing ordering.

* Code must be added to detect the issue.  This does not strictly
  need to be done at pool import time but doing so will make the
  errata visible in 'zpool import' as well as 'zpool status'.  Once
  detected the spa->spa_errata member should be set to the new enum.

* If possible code should be added to clear the spa->spa_errata member
  once the errata has been resolved.

* The show_import() and status_callback() functions must be updated
  to include an informational message describing the errata.  This
  should include an action message decribing what an administrator
  should do to address the errata.

* The documentation at http://zfsonlinux.org/msg/ZFS-8000-ER must be
  updated to describe the errata.  This space can be used to provide
  as much additional information as needed to fully describe the errata.
  A link to this documentation will be automatically generated in the
  output of 'zpool import' and 'zpool status'.

Original-idea-by: Time Chaos <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Signed-off-by: Tim Chase <[email protected]>
Signed-off-by: Richard Yao <[email protected]
Issue openzfs#2094
ZoL commit 1421c89 unintentionally changed the disk format in a forward-
compatible, but not backward compatible way. This was accomplished by
adding an entry to zbookmark_t, which is included in a couple of
on-disk structures. That lead to the creation of pools with incorrect
dsl_scan_phys_t objects that could only be imported by versions of ZoL
containing that commit.  Such pools cannot be imported by other versions
of ZFS or past versions of ZoL.

The additional field has been removed by the previous commit.  However,
affected pools must be imported and scrubbed using a version of ZoL with
this commit applied.  This will return the pools to a state in which they
may be imported by other implementations.

The 'zpool import' or 'zpool status' command can be used to determine if
a pool is impacted.  A message similar to one of the following means your
pool must be scrubbed to restore compatibility.

$ zpool import
   pool: zol-0.6.2-173
     id: 1165955789558693437
  state: ONLINE
 status: Errata #1 detected.
 action: The pool can be imported using its name or numeric identifier,
         however there is a compatibility issue which should be corrected
         by running 'zpool scrub'
    see: http://zfsonlinux.org/msg/ZFS-8000-ER
 config:
 ...

$ zpool status
  pool: zol-0.6.2-173
 state: ONLINE
  scan: pool compatibility issue detected.
   see: openzfs#2094
action: To correct the issue run 'zpool scrub'.
config:
...

If there was an async destroy in progress 'zpool import' will prevent
the pool from being imported.  Further advice on how to proceed will be
provided by the error message as follows.

$ zpool import
   pool: zol-0.6.2-173
     id: 1165955789558693437
  state: ONLINE
 status: Errata #2 detected.
 action: The pool can not be imported with this version of ZFS due to an
         active asynchronous destroy.  Revert to an earlier version and
         allow the destroy to complete before updating.
         see: http://zfsonlinux.org/msg/ZFS-8000-ER
 config:
 ...

Pools affected by the damaged dsl_scan_phys_t can be detected prior to
an upgrade by running the following command as root:

zdb -dddd poolname 1 | grep -P '^\t\tscan = ' | sed -e 's;scan = ;;' | wc -w

Note that `poolname` must be replaced with the name of the pool you wish
to check. A value of 25 indicates the dsl_scan_phys_t has been damaged.
A value of 24 indicates that the dsl_scan_phys_t is normal. A value of 0
indicates that there has never been a scrub run on the pool.

The regression caused by the change to zbookmark_t never made it into a
tagged release, Gentoo backports, Ubuntu, Debian, Fedora, or EPEL
stable respositorys.  Only those using the HEAD version directly from
Github after the 0.6.2 but before the 0.6.3 tag are affected.

This patch does have one limitation that should be mentioned.  It will not
detect errata #2 on a pool unless errata #1 is also present.  It expected
this will not be a significant problem because pools impacted by errata #2
have a high probably of being impacted by errata #1.

End users can ensure they do no hit this unlikely case by waiting for all
asynchronous destroy operations to complete before updating ZoL.  The
presence of any background destroys on any imported pools can be checked
by running `zpool get freeing` as root.  This will display a non-zero
value for any pool with an active asynchronous destroy.

Lastly, it is expected that no user data has been lost as a result of
this erratum.

Original-patch-by: Tim Chase <[email protected]>
Reworked-by: Brian Behlendorf <[email protected]>
Signed-off-by: Richard Yao <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Issue openzfs#2094
@behlendorf
Copy link
Contributor Author

Refreshed based on above comments. My plan is to merge this to master tomorrow unless I hear an objection.

@ryao
Copy link
Contributor

ryao commented Feb 21, 2014

@behlendorf I have no objections.

@dweeezil
Copy link
Contributor

@behlendorf Looks good to me save for the typo in my name in behlendorf/zfs@070572f.

@behlendorf
Copy link
Contributor Author

I've no idea how I mangled your name like that. I'll fix it when I merge it.

@behlendorf
Copy link
Contributor Author

Merged as:

3965d2e Merge branch 'issue-2094'
4f2dcb3 Add erratum for issue #2094
ffe9d38 Add generic errata infrastructure
731782e Use expected zpool_status_t type
ed9e836 Revert changes to zbookmark_t
a16bc6b Add zimport.sh compatibility test script

Everyone, thank you for helping us get this cleanly sorted out!

@behlendorf behlendorf closed this Feb 21, 2014
@lundman
Copy link
Contributor

lundman commented Feb 28, 2014

Just reporting in that osx.zfs pull of ZOL detected my pool had the errata, and zpool scrub corrected everything. Thank you all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants