-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extend deadman logic #6999
Extend deadman logic #6999
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@behlendorf I'll get this reviewed within the next day. Based on a reading of the commit comments, it all sounds very good to me. The one thing I looked for right away was the setting of zfs_deadman_synctime_ms
which I'm glad to see you lowered. Do you think we should have yet another module parameter to differentiate between the spa deadman and the vdev deadman?
@dweeezil thanks, it's appreciated. Any additional testing you can offer or recommended values for the tunings would be welcome.
I'm not sure I follow. What did you have in mind? |
module/zfs/zio.c
Outdated
|
||
if (error == -1) { | ||
uint64_t delta = gethrtime() - zio->io_queued_timestamp; | ||
if (delta > spa_deadman_ziotime(zio->io_spa)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably want to add an additional check here such that zio_deadman()
can only run once for a hung zio
. Or alternately it runs at the lower frequency of zfs_deadman_ziotime_ms
intervals, instead of every zfs_deadman_checktime_ms
which is what the code does now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I opted to leave this as-is such that it's consistent with the zfs_deadman_synctime_ms
behavior. The document ion has been updated accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM save for some additional documentation we ought to have as indicated in the individual comments.
I think the whole deadman logic is sufficiently important that it might eventually warrant its own man page, maybe zfs_deadman(5)
, to explain the overall system.
Also, in the commit comment regarding zio_interrupt()
, we ought to have a mention of something like "flaky hardware" as potentially being a cause of missing completion events.
@@ -37,6 +37,7 @@ extern "C" { | |||
#define FM_EREPORT_ZFS_IO "io" | |||
#define FM_EREPORT_ZFS_DATA "data" | |||
#define FM_EREPORT_ZFS_DELAY "delay" | |||
#define FM_EREPORT_ZFS_DEADMAN "deadman" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll want to add this to the zfs-events(5) man page and also probably beef up the existing documentation for the plain "delay" event. In particular, we'll want to mention its (the plain 'delay' event) interaction with zfs_deadman_ziotime_ms
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do. I also opted simply to expand on the deadman behavior in the existing modules man page rather than add a new one.
cmd/ztest/ztest.c
Outdated
MSEC2NSEC(zfs_deadman_synctime_ms); | ||
|
||
(void) poll(NULL, 0, (int)NSEC2MSEC(delta)); | ||
total += zfs_deadman_synctime_ms / 1000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about moving this after the "overdue = " line below to better align with the upstream code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do
* continue - Attempt to recover from a "hung" I/O | ||
* panic - Panic the system | ||
*/ | ||
char *zfs_deadman_failmode = "wait"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Presumably the rationale for using "wait" as the default instead of "continue" (which would probably be more useful) is not to change the current behavior (much)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That and possible that the "continue" behavior might have side effects or result in additional instability. We can revisit the default setting once we're more comfortable with the proposed recovery logic. But for now let's keep the core behavior the same aside from improving the logging.
module/zfs/spa_misc.c
Outdated
int error = 0; | ||
|
||
if (strcmp(failmode, "wait") == 0) | ||
spa->spa_deadman_failmode = ZIO_FAILURE_MODE_WAIT; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe the default should be to set to ZIO_FAILURE_MODE_WAIT
and then this can become a void
-returning function and the return value check in spa_add()
can be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure.
Hasnt blown up yet in zloop on a 4.9.74-unofficial_grsec kernel (kernexec and such tend to catch runtime issues pretty well, nothing yet). |
bafa4f8
to
467b8fe
Compare
Refreshed. After discovering one more possible way |
6e09952
to
c8b3299
Compare
Codecov Report
@@ Coverage Diff @@
## master #6999 +/- ##
==========================================
- Coverage 75.37% 75.34% -0.03%
==========================================
Files 296 296
Lines 95539 95633 +94
==========================================
+ Hits 72014 72059 +45
- Misses 23525 23574 +49
Continue to review full report at Codecov.
|
954520f
to
8adbd54
Compare
Save debugging information automatically collected from the last three core dumps generated by ztest. This change does not modify the existing behavior when those log files are missing. Requires openzfs/zfs#6999. Signed-off-by: Brian Behlendorf <[email protected]>
The intent of this patch is extend the existing deadman code such that it's flexible enough to be used by both ztest and on production systems. The proposed changes include: * Added a new `zfs_deadman_failmode` module option which is used to dynamically control the behavior of the deadman. It's loosely modeled after, but independant from, the pool failmode property. It can be set to wait, continue, or panic. * wait - Wait for the "hung" I/O (default) * continue - Attempt to recover from a "hung" I/O * panic - Panic the system * Added a new `zfs_deadman_ziotime_ms` module option which is analogous to `zfs_deadman_synctime_ms` except instead of applying to a pool TXG sync it applies to zio_wait(). A default value of 300s is used to define a "hung" zio. * The ztest deadman thread has been re-enabled by default, aligned with the upstream OpenZFS code, and then extended to terminate the process when it takes significantly longer to complete than expected. * The -G option was added to ztest to print the internal debug log when a fatal error is encountered. This same option was previously added to zdb in commit fa603f8. Update zloop.sh to unconditionally pass -G to obtain additional debugging. * The FM_EREPORT_ZFS_DELAY event which was previously posted when the deadman detect a "hung" pool has been replaced by a new dedicated FM_EREPORT_ZFS_DEADMAN event. * The proposed recovery logic attempts to restart a "hung" zio by calling zio_interrupt() on any outstanding leaf zios. We may want to further restrict this to zios in either the ZIO_STAGE_VDEV_IO_START or ZIO_STAGE_VDEV_IO_DONE stages. Calling zio_interrupt() is expected to only be useful for cases when an IO has been submitted to the physical device but for some reasonable the completion callback hasn't been called by the lower layers. This shouldn't be possible but has been observed and may be caused by kernel/driver bugs. * The 'zfs_deadman_synctime_ms' default value was reduced from 1000s to 600s. * Depending on how ztest fails there may be no cache file to move. This should not be considered fatal, collect the logs which are available and carry on. * Add deadman test cases for spa_deadman() and zio_wait(). * Increase default zfs_deadman_checktime_ms to 60s. Signed-off-by: Brian Behlendorf <[email protected]> Requires-spl: refs/pull/674/head TEST_ZTEST_TIMEOUT=3600
The zdb(8) command may not terminate in the case where the pool gets suspended and there is a caller in zio_wait() blocking on an outstanding read I/O that will never complete. This can in turn cause ztest(1) to block indefinitely despite the deadman. Resolve the issue by setting the default failure mode for zdb(8) to panic. In user space we always want the command to terminate when forward progress is no longer possible. Signed-off-by: Brian Behlendorf <[email protected]>
In order to debug issues encounted by ztest during automated testing it's important that as much debugging information as possible by dumped at the time of the failure. The following changes extend the zloop.sh script in order to make it easier to integrate with buildbot. * Add the `-m <maximum cores>` option to zloop.sh to place a limit of the number of core dumps generated. By default, the existing behavior is maintained and no limit is set. * Add the `-l` option to create a 'ztest.core.N' symlink in the current directory to the core directory. This functionality is provided primarily for buildbot which expects log files to have well known names. * Rename 'ztest.ddt' to 'ztest.zdb' and extend it to dump additional basic information on failure for latter analysis. Signed-off-by: Brian Behlendorf <[email protected]> Requires-spl: refs/pull/674/head TEST_ZTEST_TIMEOUT=3600
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Patch looks great to me. 2 minor nitpicks, but otherwise I can't wait to get these changes on our systems.
\fBzfs_deadman_synctime_ms\fR milliseconds, continue to check for slow | ||
operations every \fBzfs_deadman_checktime_ms\fR milliseconds. | ||
Check time in milliseconds. This defines the frequency at which we check | ||
for hung I/O and invoke the \fBzfs_deadman_failmode\fR behavior. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and potentially invoke....
export SYNCTIME_DEFAULT=600000 | ||
export ZIOTIME_DEFAULT=300000 | ||
export CHECKTIME_DEFAULT=60000 | ||
export FAILMODE_DEFAULT="wait" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should these be detected just by cat
ing the /sys/
directory? Might make it a bit more future-proof. Just a nitpick.
The zdb(8) command may not terminate in the case where the pool gets suspended and there is a caller in zio_wait() blocking on an outstanding read I/O that will never complete. This can in turn cause ztest(1) to block indefinitely despite the deadman. Resolve the issue by setting the default failure mode for zdb(8) to panic. In user space we always want the command to terminate when forward progress is no longer possible. Reviewed-by: Tim Chase <[email protected]> Reviewed by: Thomas Caputi <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes openzfs#6999
In order to debug issues encountered by ztest during automated testing it's important that as much debugging information as possible by dumped at the time of the failure. The following changes extend the zloop.sh script in order to make it easier to integrate with buildbot. * Add the `-m <maximum cores>` option to zloop.sh to place a limit of the number of core dumps generated. By default, the existing behavior is maintained and no limit is set. * Add the `-l` option to create a 'ztest.core.N' symlink in the current directory to the core directory. This functionality is provided primarily for buildbot which expects log files to have well known names. * Rename 'ztest.ddt' to 'ztest.zdb' and extend it to dump additional basic information on failure for latter analysis. Reviewed-by: Tim Chase <[email protected]> Reviewed by: Thomas Caputi <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes openzfs#6999
Save debugging information automatically collected from the last three core dumps generated by ztest. This change does not modify the existing behavior when those log files are missing. Requires openzfs/zfs#6999. Signed-off-by: Brian Behlendorf <[email protected]>
The intent of this patch is extend the existing deadman code such that it's flexible enough to be used by both ztest and on production systems. The proposed changes include: * Added a new `zfs_deadman_failmode` module option which is used to dynamically control the behavior of the deadman. It's loosely modeled after, but independant from, the pool failmode property. It can be set to wait, continue, or panic. * wait - Wait for the "hung" I/O (default) * continue - Attempt to recover from a "hung" I/O * panic - Panic the system * Added a new `zfs_deadman_ziotime_ms` module option which is analogous to `zfs_deadman_synctime_ms` except instead of applying to a pool TXG sync it applies to zio_wait(). A default value of 300s is used to define a "hung" zio. * The ztest deadman thread has been re-enabled by default, aligned with the upstream OpenZFS code, and then extended to terminate the process when it takes significantly longer to complete than expected. * The -G option was added to ztest to print the internal debug log when a fatal error is encountered. This same option was previously added to zdb in commit fa603f8. Update zloop.sh to unconditionally pass -G to obtain additional debugging. * The FM_EREPORT_ZFS_DELAY event which was previously posted when the deadman detect a "hung" pool has been replaced by a new dedicated FM_EREPORT_ZFS_DEADMAN event. * The proposed recovery logic attempts to restart a "hung" zio by calling zio_interrupt() on any outstanding leaf zios. We may want to further restrict this to zios in either the ZIO_STAGE_VDEV_IO_START or ZIO_STAGE_VDEV_IO_DONE stages. Calling zio_interrupt() is expected to only be useful for cases when an IO has been submitted to the physical device but for some reasonable the completion callback hasn't been called by the lower layers. This shouldn't be possible but has been observed and may be caused by kernel/driver bugs. * The 'zfs_deadman_synctime_ms' default value was reduced from 1000s to 600s. * Depending on how ztest fails there may be no cache file to move. This should not be considered fatal, collect the logs which are available and carry on. * Add deadman test cases for spa_deadman() and zio_wait(). * Increase default zfs_deadman_checktime_ms to 60s. Reviewed-by: Tim Chase <[email protected]> Reviewed by: Thomas Caputi <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes openzfs#6999
The zdb(8) command may not terminate in the case where the pool gets suspended and there is a caller in zio_wait() blocking on an outstanding read I/O that will never complete. This can in turn cause ztest(1) to block indefinitely despite the deadman. Resolve the issue by setting the default failure mode for zdb(8) to panic. In user space we always want the command to terminate when forward progress is no longer possible. Reviewed-by: Tim Chase <[email protected]> Reviewed by: Thomas Caputi <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes openzfs#6999
In order to debug issues encountered by ztest during automated testing it's important that as much debugging information as possible by dumped at the time of the failure. The following changes extend the zloop.sh script in order to make it easier to integrate with buildbot. * Add the `-m <maximum cores>` option to zloop.sh to place a limit of the number of core dumps generated. By default, the existing behavior is maintained and no limit is set. * Add the `-l` option to create a 'ztest.core.N' symlink in the current directory to the core directory. This functionality is provided primarily for buildbot which expects log files to have well known names. * Rename 'ztest.ddt' to 'ztest.zdb' and extend it to dump additional basic information on failure for latter analysis. Reviewed-by: Tim Chase <[email protected]> Reviewed by: Thomas Caputi <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes openzfs#6999
In order to debug issues encountered by ztest during automated testing it's important that as much debugging information as possible by dumped at the time of the failure. The following changes extend the zloop.sh script in order to make it easier to integrate with buildbot. * Add the `-m <maximum cores>` option to zloop.sh to place a limit of the number of core dumps generated. By default, the existing behavior is maintained and no limit is set. * Add the `-l` option to create a 'ztest.core.N' symlink in the current directory to the core directory. This functionality is provided primarily for buildbot which expects log files to have well known names. * Rename 'ztest.ddt' to 'ztest.zdb' and extend it to dump additional basic information on failure for latter analysis. Reviewed-by: Tim Chase <[email protected]> Reviewed by: Thomas Caputi <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes openzfs#6999 Conflicts: scripts/zloop.sh
In order to debug issues encountered by ztest during automated testing it's important that as much debugging information as possible by dumped at the time of the failure. The following changes extend the zloop.sh script in order to make it easier to integrate with buildbot. * Add the `-m <maximum cores>` option to zloop.sh to place a limit of the number of core dumps generated. By default, the existing behavior is maintained and no limit is set. * Add the `-l` option to create a 'ztest.core.N' symlink in the current directory to the core directory. This functionality is provided primarily for buildbot which expects log files to have well known names. * Rename 'ztest.ddt' to 'ztest.zdb' and extend it to dump additional basic information on failure for latter analysis. Reviewed-by: Tim Chase <[email protected]> Reviewed by: Thomas Caputi <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes openzfs#6999 Conflicts: scripts/zloop.sh
In order to debug issues encountered by ztest during automated testing it's important that as much debugging information as possible by dumped at the time of the failure. The following changes extend the zloop.sh script in order to make it easier to integrate with buildbot. * Add the `-m <maximum cores>` option to zloop.sh to place a limit of the number of core dumps generated. By default, the existing behavior is maintained and no limit is set. * Add the `-l` option to create a 'ztest.core.N' symlink in the current directory to the core directory. This functionality is provided primarily for buildbot which expects log files to have well known names. * Rename 'ztest.ddt' to 'ztest.zdb' and extend it to dump additional basic information on failure for latter analysis. Reviewed-by: Tim Chase <[email protected]> Reviewed by: Thomas Caputi <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #6999 Conflicts: scripts/zloop.sh
The intent of this patch is extend the existing deadman code such that it's flexible enough to be used by both ztest and on production systems. The proposed changes include: * Added a new `zfs_deadman_failmode` module option which is used to dynamically control the behavior of the deadman. It's loosely modeled after, but independant from, the pool failmode property. It can be set to wait, continue, or panic. * wait - Wait for the "hung" I/O (default) * continue - Attempt to recover from a "hung" I/O * panic - Panic the system * Added a new `zfs_deadman_ziotime_ms` module option which is analogous to `zfs_deadman_synctime_ms` except instead of applying to a pool TXG sync it applies to zio_wait(). A default value of 300s is used to define a "hung" zio. * The ztest deadman thread has been re-enabled by default, aligned with the upstream OpenZFS code, and then extended to terminate the process when it takes significantly longer to complete than expected. * The -G option was added to ztest to print the internal debug log when a fatal error is encountered. This same option was previously added to zdb in commit fa603f8. Update zloop.sh to unconditionally pass -G to obtain additional debugging. * The FM_EREPORT_ZFS_DELAY event which was previously posted when the deadman detect a "hung" pool has been replaced by a new dedicated FM_EREPORT_ZFS_DEADMAN event. * The proposed recovery logic attempts to restart a "hung" zio by calling zio_interrupt() on any outstanding leaf zios. We may want to further restrict this to zios in either the ZIO_STAGE_VDEV_IO_START or ZIO_STAGE_VDEV_IO_DONE stages. Calling zio_interrupt() is expected to only be useful for cases when an IO has been submitted to the physical device but for some reasonable the completion callback hasn't been called by the lower layers. This shouldn't be possible but has been observed and may be caused by kernel/driver bugs. * The 'zfs_deadman_synctime_ms' default value was reduced from 1000s to 600s. * Depending on how ztest fails there may be no cache file to move. This should not be considered fatal, collect the logs which are available and carry on. * Add deadman test cases for spa_deadman() and zio_wait(). * Increase default zfs_deadman_checktime_ms to 60s. Reviewed-by: Tim Chase <[email protected]> Reviewed by: Thomas Caputi <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes openzfs#6999
The zdb(8) command may not terminate in the case where the pool gets suspended and there is a caller in zio_wait() blocking on an outstanding read I/O that will never complete. This can in turn cause ztest(1) to block indefinitely despite the deadman. Resolve the issue by setting the default failure mode for zdb(8) to panic. In user space we always want the command to terminate when forward progress is no longer possible. Reviewed-by: Tim Chase <[email protected]> Reviewed by: Thomas Caputi <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes openzfs#6999
In order to debug issues encountered by ztest during automated testing it's important that as much debugging information as possible by dumped at the time of the failure. The following changes extend the zloop.sh script in order to make it easier to integrate with buildbot. * Add the `-m <maximum cores>` option to zloop.sh to place a limit of the number of core dumps generated. By default, the existing behavior is maintained and no limit is set. * Add the `-l` option to create a 'ztest.core.N' symlink in the current directory to the core directory. This functionality is provided primarily for buildbot which expects log files to have well known names. * Rename 'ztest.ddt' to 'ztest.zdb' and extend it to dump additional basic information on failure for latter analysis. Reviewed-by: Tim Chase <[email protected]> Reviewed by: Thomas Caputi <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes openzfs#6999
The intent of this patch is extend the existing deadman code such that it's flexible enough to be used by both ztest and on production systems. The proposed changes include: * Added a new `zfs_deadman_failmode` module option which is used to dynamically control the behavior of the deadman. It's loosely modeled after, but independant from, the pool failmode property. It can be set to wait, continue, or panic. * wait - Wait for the "hung" I/O (default) * continue - Attempt to recover from a "hung" I/O * panic - Panic the system * Added a new `zfs_deadman_ziotime_ms` module option which is analogous to `zfs_deadman_synctime_ms` except instead of applying to a pool TXG sync it applies to zio_wait(). A default value of 300s is used to define a "hung" zio. * The ztest deadman thread has been re-enabled by default, aligned with the upstream OpenZFS code, and then extended to terminate the process when it takes significantly longer to complete than expected. * The -G option was added to ztest to print the internal debug log when a fatal error is encountered. This same option was previously added to zdb in commit fa603f8. Update zloop.sh to unconditionally pass -G to obtain additional debugging. * The FM_EREPORT_ZFS_DELAY event which was previously posted when the deadman detect a "hung" pool has been replaced by a new dedicated FM_EREPORT_ZFS_DEADMAN event. * The proposed recovery logic attempts to restart a "hung" zio by calling zio_interrupt() on any outstanding leaf zios. We may want to further restrict this to zios in either the ZIO_STAGE_VDEV_IO_START or ZIO_STAGE_VDEV_IO_DONE stages. Calling zio_interrupt() is expected to only be useful for cases when an IO has been submitted to the physical device but for some reasonable the completion callback hasn't been called by the lower layers. This shouldn't be possible but has been observed and may be caused by kernel/driver bugs. * The 'zfs_deadman_synctime_ms' default value was reduced from 1000s to 600s. * Depending on how ztest fails there may be no cache file to move. This should not be considered fatal, collect the logs which are available and carry on. * Add deadman test cases for spa_deadman() and zio_wait(). * Increase default zfs_deadman_checktime_ms to 60s. Reviewed-by: Tim Chase <[email protected]> Reviewed by: Thomas Caputi <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes openzfs#6999
The intent of this patch is extend the existing deadman code such that it's flexible enough to be used by both ztest and on production systems. The proposed changes include: * Added a new `zfs_deadman_failmode` module option which is used to dynamically control the behavior of the deadman. It's loosely modeled after, but independant from, the pool failmode property. It can be set to wait, continue, or panic. * wait - Wait for the "hung" I/O (default) * continue - Attempt to recover from a "hung" I/O * panic - Panic the system * Added a new `zfs_deadman_ziotime_ms` module option which is analogous to `zfs_deadman_synctime_ms` except instead of applying to a pool TXG sync it applies to zio_wait(). A default value of 300s is used to define a "hung" zio. * The ztest deadman thread has been re-enabled by default, aligned with the upstream OpenZFS code, and then extended to terminate the process when it takes significantly longer to complete than expected. * The -G option was added to ztest to print the internal debug log when a fatal error is encountered. This same option was previously added to zdb in commit fa603f8. Update zloop.sh to unconditionally pass -G to obtain additional debugging. * The FM_EREPORT_ZFS_DELAY event which was previously posted when the deadman detect a "hung" pool has been replaced by a new dedicated FM_EREPORT_ZFS_DEADMAN event. * The proposed recovery logic attempts to restart a "hung" zio by calling zio_interrupt() on any outstanding leaf zios. We may want to further restrict this to zios in either the ZIO_STAGE_VDEV_IO_START or ZIO_STAGE_VDEV_IO_DONE stages. Calling zio_interrupt() is expected to only be useful for cases when an IO has been submitted to the physical device but for some reasonable the completion callback hasn't been called by the lower layers. This shouldn't be possible but has been observed and may be caused by kernel/driver bugs. * The 'zfs_deadman_synctime_ms' default value was reduced from 1000s to 600s. * Depending on how ztest fails there may be no cache file to move. This should not be considered fatal, collect the logs which are available and carry on. * Add deadman test cases for spa_deadman() and zio_wait(). * Increase default zfs_deadman_checktime_ms to 60s. Reviewed-by: Tim Chase <[email protected]> Reviewed by: Thomas Caputi <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes openzfs#6999
Description
The intent of this patch is extend the existing deadman code such that it's flexible enough to be used by both ztest and on production systems. The proposed changes include:
Added a new
zfs_deadman_failmode
module option which is used to dynamically control the behavior of the deadman. It's loosely modeled after, but independant from, the pool failmode property. It can be set to wait, continue, or panic.Added a new
zfs_deadman_ziotime_ms
module option which is analogous to zfs_deadman_synctime_ms` except instead of applying to a pool TXG sync it applies to zio_wait(). A default value of 300s is used to define a "hung" zio.The ztest deadman thread has been re-enabled by default, aligned with the upstream OpenZFS code, and then extended to terminate the process when it takes significantly longer to complete than expected.
The -G option was added to ztest to print the internal debug log when a fatal error is encountered. This same option was previously added to zdb in commit fa603f8. Update zloop.sh to unconditionally pass -G to obtain additional debugging.
The FM_EREPORT_ZFS_DELAY event which was previously posted when the deadman detect a "hung" pool has been replaced by a new dedicated FM_EREPORT_ZFS_DEADMAN event.
The proposed recovery logic attempts to restart a "hung" zio by calling zio_interrupt() on any outstanding leaf zios. We may want to further restrict this to zios in either the ZIO_STAGE_VDEV_IO_START or ZIO_STAGE_VDEV_IO_DONE stages. Calling zio_interrupt() is expected to only be useful for cases when an IO has been submitted to the physical device
but for some reasonable the completion callback hasn't been called by the lower layers. This shouldn't be possible but has been observed and may be caused by kernel/driver bugs.
The 'zfs_deadman_synctime_ms' default value was reduced from 1000s to 600s.
Depending on how ztest fails there may be no cache file to move. This should not be considered fatal, collect the logs which are available and carry on.
Motivation and Context
Add some of the needed infrastructure to make it possible to root cause
ztest
"hangs" observed during automated testing. With this change applied at least basic debugging information will be collected for any "hangs". This change can be further augmented with improvements to the debugging infrastructure.Issue #6901.
How Has This Been Tested?
Locally by running
zloop.sh
in-tree for approximated 4 days. Over this time period the deadman behaved as expected and properly terminatedztest
when it appeared to be hung. Further analysis of the debug logs and cores obtained is still needed. The expectation is they will provide some statistical insight in the most often observed failures.Types of changes
Checklist:
Signed-off-by
.