Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zpool import progress kstat #8696

Merged
merged 1 commit into from
May 9, 2019

Conversation

ofaaland
Copy link
Contributor

@ofaaland ofaaland commented Apr 30, 2019

Motivation and Context

When an import requires a long MMP activity check, or when the user
requests pool recovery, the import make take a long time. The user may
not know why, or be able to tell whether the import is progressing or is
hung.

Description

Add a kstat which lists all imports currently being processed by the
kernel (currently only one at a time is possible, but the kstat allows
for more than one). The kstat is at
/proc/spl/kstat/zfs/import_progress.

The kstat contents are as follows:

pool_guid            load_state multihost_secs  max_txg pool_name
16667015954387398856 3           15             0       tank3

load_state: the value of spa_load_state
multihost_secs: number of seconds left in the multihost activity check, if any.
max_txg: current spa_load_max_txg, if rewind is occurring

This could be used by outside tools, such as a pacemaker resource agent,
to report import progress, or as a part of manual troubleshooting. The
zpool import subcommand could also be modified to report this
information.

How Has This Been Tested?

Manual testing to confirm contents, including imports both with and without rewind and multihost activity check, together and separately. Multiple imports issued in userspace at the same time. Ran zloop and MMP tests in ZTS.

Updated zpool import to use this procfile (See #8646).

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation (a change to man pages or other documentation)

Checklist:

@ofaaland ofaaland requested a review from behlendorf April 30, 2019 19:16
@ofaaland ofaaland force-pushed the b-import-progress-kstat branch from 49395e3 to c495184 Compare April 30, 2019 20:37
@behlendorf behlendorf added the Status: Code Review Needed Ready for review and testing label May 2, 2019
@ofaaland ofaaland requested review from dinatale2 and sdimitro May 2, 2019 15:50
@ofaaland
Copy link
Contributor Author

ofaaland commented May 6, 2019

Hi @utopiabound is it possible for you to review this patch? Thanks.

@ofaaland ofaaland force-pushed the b-import-progress-kstat branch 3 times, most recently from c476d15 to 03a5aaa Compare May 8, 2019 00:49
@ofaaland
Copy link
Contributor Author

ofaaland commented May 8, 2019

@behlendorf , I've addressed your comments above. I also addressed the issues we discussed offline:
Instead of reporting the start and end times of the activity check, the kstat now reports seconds remaining. I believe the failure code paths are now properly handled, and the load state is now updated everywhere.

When an import requires a long MMP activity check, or when the user
requests pool recovery, the import make take a long time.  The user may
not know why, or be able to tell whether the import is progressing or is
hung.

Add a kstat which lists all imports currently being processed by the
kernel (currently only one at a time is possible, but the kstat allows
for more than one).  The kstat is at
/proc/spl/kstat/zfs/import_progress.

The kstat contents are as follows:
pool_guid         load_state multihost_secs  max_txg pool_name
16667015954387398 3          15              0       tank3

load_state: the value of spa_load_state
multihost_secs:  seconds until the end of the multihost activity
                 check; if over, or none required, this is 0
max_txg: current spa_load_max_txg, if rewind is occurring

This could be used by outside tools, such as a pacemaker resource agent,
to report import progress, or as a part of manual troubleshooting.  The
zpool import subcommand could also be modified to report this
information.

Signed-off-by: Olaf Faaland <[email protected]>
@ofaaland ofaaland force-pushed the b-import-progress-kstat branch from 03a5aaa to 930f70a Compare May 8, 2019 00:56
@codecov
Copy link

codecov bot commented May 8, 2019

Codecov Report

Merging #8696 into master will increase coverage by 0.17%.
The diff coverage is 77.87%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #8696      +/-   ##
==========================================
+ Coverage   78.73%    78.9%   +0.17%     
==========================================
  Files         381      381              
  Lines      117674   117744      +70     
==========================================
+ Hits        92649    92909     +260     
+ Misses      25025    24835     -190
Flag Coverage Δ
#kernel 79.42% <80.18%> (+0.11%) ⬆️
#user 67.69% <54.86%> (+0.46%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1f02ecc...930f70a. Read the comment docs.

Copy link
Contributor

@behlendorf behlendorf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing the review feedback so quickly.

@behlendorf behlendorf added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels May 8, 2019
@behlendorf behlendorf merged commit ca95f70 into openzfs:master May 9, 2019
@ofaaland
Copy link
Contributor Author

ofaaland commented May 9, 2019

Instead of reporting the start and end times of the activity check, the kstat now reports seconds remaining

A note for future me: the reasoning is that the kernel knows with certainty how much wall clock time is left in the test. It uses a monotonic clock source. However translating that monotonic end time to system time is error-prone due to the possibility that something (e.g. NTP) changes the system time during the test. So we just give seconds remaining.

@ofaaland ofaaland deleted the b-import-progress-kstat branch May 9, 2019 17:33
allanjude pushed a commit to allanjude/zfs that referenced this pull request Jun 7, 2019
When an import requires a long MMP activity check, or when the user
requests pool recovery, the import make take a long time.  The user may
not know why, or be able to tell whether the import is progressing or is
hung.

Add a kstat which lists all imports currently being processed by the
kernel (currently only one at a time is possible, but the kstat allows
for more than one).  The kstat is /proc/spl/kstat/zfs/import_progress.

The kstat contents are as follows:
pool_guid         load_state multihost_secs  max_txg pool_name
16667015954387398 3          15              0       tank3

load_state: the value of spa_load_state
multihost_secs:  seconds until the end of the multihost activity
                 check; if over, or none required, this is 0
max_txg: current spa_load_max_txg, if rewind is occurring

This could be used by outside tools, such as a pacemaker resource agent,
to report import progress, or as a part of manual troubleshooting.  The
zpool import subcommand could also be modified to report this
information.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Olaf Faaland <[email protected]>
Closes openzfs#8696
allanjude pushed a commit to allanjude/zfs that referenced this pull request Jun 15, 2019
When an import requires a long MMP activity check, or when the user
requests pool recovery, the import make take a long time.  The user may
not know why, or be able to tell whether the import is progressing or is
hung.

Add a kstat which lists all imports currently being processed by the
kernel (currently only one at a time is possible, but the kstat allows
for more than one).  The kstat is /proc/spl/kstat/zfs/import_progress.

The kstat contents are as follows:
pool_guid         load_state multihost_secs  max_txg pool_name
16667015954387398 3          15              0       tank3

load_state: the value of spa_load_state
multihost_secs:  seconds until the end of the multihost activity
                 check; if over, or none required, this is 0
max_txg: current spa_load_max_txg, if rewind is occurring

This could be used by outside tools, such as a pacemaker resource agent,
to report import progress, or as a part of manual troubleshooting.  The
zpool import subcommand could also be modified to report this
information.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Olaf Faaland <[email protected]>
Closes openzfs#8696
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Accepted Ready to integrate (reviewed, tested)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants