Skip to content

Commit

Permalink
Review feedback
Browse files Browse the repository at this point in the history
* Moved zfs_deadman_synctime_ms in ztest_deadman_thread()
* spa_set_deadman_failmode() updated to return void and
  default to ZIO_FAILURE_MODE_WAIT.
* Improved documentation of all deadman module options.
* spa->spa_deadman_calls now updated in zio_deadman() so all
  invokations of the deadman behavior are counted.
* Only allow a zio to be re-dispatched if it's not already
  on a taskq since it can only safely exist on one taskq.
* Drop zio->io_lock prior to calling zio_deadman().

Signed-off-by: Brian Behlendorf <[email protected]>
Requires-spl: refs/pull/674/head
TEST_ZTEST_TIMEOUT=3600
  • Loading branch information
behlendorf committed Jan 17, 2018
1 parent 1e3bad8 commit c8b3299
Show file tree
Hide file tree
Showing 6 changed files with 76 additions and 41 deletions.
2 changes: 1 addition & 1 deletion cmd/ztest/ztest.c
Original file line number Diff line number Diff line change
Expand Up @@ -6256,7 +6256,6 @@ ztest_deadman_thread(void *arg)
MSEC2NSEC(zfs_deadman_synctime_ms);

(void) poll(NULL, 0, (int)NSEC2MSEC(delta));
total += zfs_deadman_synctime_ms / 1000;

/*
* If the pool is suspended then fail immediately. Otherwise,
Expand All @@ -6277,6 +6276,7 @@ ztest_deadman_thread(void *arg)
* then it may be hung and is terminated.
*/
overdue = zs->zs_proc_stop + MSEC2NSEC(zfs_deadman_synctime_ms);
total += zfs_deadman_synctime_ms / 1000;
if (gethrtime() > overdue) {
fatal(0, "aborting test after %llu seconds because "
"the process is overdue for termination.", total);
Expand Down
2 changes: 1 addition & 1 deletion include/sys/spa.h
Original file line number Diff line number Diff line change
Expand Up @@ -957,7 +957,7 @@ extern int spa_max_replication(spa_t *spa);
extern int spa_prev_software_version(spa_t *spa);
extern uint64_t spa_get_failmode(spa_t *spa);
extern uint64_t spa_get_deadman_failmode(spa_t *spa);
extern int spa_set_deadman_failmode(spa_t *spa, const char *failmode);
extern void spa_set_deadman_failmode(spa_t *spa, const char *failmode);
extern boolean_t spa_suspended(spa_t *spa);
extern uint64_t spa_bootfs(spa_t *spa);
extern uint64_t spa_delegation(spa_t *spa);
Expand Down
19 changes: 16 additions & 3 deletions man/man5/zfs-events.5
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ part here.
\fBchecksum\fR
.ad
.RS 12n
Issued when a checksum error have been detected.
Issued when a checksum error has been detected.
.RE

.sp
Expand All @@ -76,14 +76,27 @@ Issued when there is an I/O error in a vdev in the pool.
Issued when there have been data errors in the pool.
.RE

.sp
.ne 2
.na
\fBdeadman\fR
.ad
.RS 12n
Issued when an I/O is determined to be "hung", this can be caused by lost
completion events due to flaky hardware or drivers. See the
\fBzfs_deadman_failmode\fR module option description for additional
information regarding "hung" I/O detection and configuration.
.RE

.sp
.ne 2
.na
\fBdelay\fR
.ad
.RS 12n
Issued when an I/O was slow to complete as defined by the zio_delay_max module
option.
Issued when a completed I/O exceeds the maximum allowed time specified
by the \fBzio_delay_max\fR module option. This can be an indicator of
problems with the underlying storage device.
.RE

.sp
Expand Down
57 changes: 38 additions & 19 deletions man/man5/zfs-module-parameters.5
Original file line number Diff line number Diff line change
Expand Up @@ -823,14 +823,36 @@ Default value: \fB0\fR.
.ad
.RS 12n
When a pool sync operation takes longer than \fBzfs_deadman_synctime_ms\fR
milliseconds, a "slow spa_sync" message is logged to the debug log
(see \fBzfs_dbgmsg_enable\fR). If \fBzfs_deadman_enabled\fR is set,
all pending IO operations are also checked and if any haven't completed
within \fBzfs_deadman_synctime_ms\fR milliseconds, a "SLOW IO" message
is logged to the debug log and a "deadman" system event with the details of
the hung IO is posted.
milliseconds, or when an individual I/O takes longer than
\fBzfs_deadman_ziotime_ms\fR milliseconds, then the operation is considered to
be "hung". If \fBzfs_deadman_enabled\fR is set then the deadman behavior is
invoked as described by the \fBzfs_deadman_failmode\fR module option.
By default the deadman is enabled and configured to \fBwait\fR which results
in "hung" I/Os only being logged. The deadman is automatically disabled
when a pool gets suspended.
.sp
Use \fB1\fR (default) to enable the slow IO check and \fB0\fR to disable.
Default value: \fB1\fR.
.RE

.sp
.ne 2
.na
\fBzfs_deadman_failmode\fR (charp)
.ad
.RS 12n
Controls the failure behavior when the deadman detects a "hung" I/O. Valid
values are \fBwait\fR, \fBcontinue\fR, and \fBpanic\fR.
.sp
\fBwait\fR - Wait for a "hung" I/O to complete. For each "hung" I/O a
"deadman" event will be posted describing that I/O.
.sp
\fBcontinue\fR - Attempt to recover from a "hung" I/O by re-dispatching it
to the I/O pipeline if possible.
.sp
\fBpanic\fR - Panic the system. This can be used to facilitate an automatic
fail-over to a properly configured fail-over partner.
.sp
Default value: \fBwait\fR.
.RE

.sp
Expand All @@ -839,9 +861,8 @@ Use \fB1\fR (default) to enable the slow IO check and \fB0\fR to disable.
\fBzfs_deadman_checktime_ms\fR (int)
.ad
.RS 12n
Once a pool sync operation has taken longer than
\fBzfs_deadman_synctime_ms\fR milliseconds, continue to check for slow
operations every \fBzfs_deadman_checktime_ms\fR milliseconds.
Check time in milliseconds. This defines the frequency at which we check
for hung I/O and invoke the \fBzfs_deadman_failmode\fR behavior.
.sp
Default value: \fB5,000\fR.
.RE
Expand All @@ -853,10 +874,9 @@ Default value: \fB5,000\fR.
.ad
.RS 12n
Interval in milliseconds after which the deadman is triggered and also
the interval after which a pool sync operation is considered to be "hung"
if \fBzfs_deadman_enabled\fR is set.

See \fBzfs_deadman_enabled\fR.
the interval after which a pool sync operation is considered to be "hung".
Once this limit is exceeded the deadman will be invoked every
\fBzfs_deadman_checktime_ms\fR milliseconds until the pool sync completes.
.sp
Default value: \fB600,000\fR.
.RE
Expand All @@ -867,11 +887,10 @@ Default value: \fB600,000\fR.
\fBzfs_deadman_ziotime_ms\fR (ulong)
.ad
.RS 12n
Interval in milliseconds after which the deadman is triggered and also
the interval after which an individual IO operation is considered to be "hung"
if \fBzfs_deadman_enabled\fR is set.

See \fBzfs_deadman_enabled\fR.
Interval in milliseconds after which the deadman is triggered and an
individual IO operation is considered to be "hung". As long as the I/O
remains "hung" the deadman will be invoked every \fBzfs_deadman_checktime_ms\fR
milliseconds until the I/O completes.
.sp
Default value: \fB300,000\fR.
.RE
Expand Down
22 changes: 10 additions & 12 deletions module/zfs/spa_misc.c
Original file line number Diff line number Diff line change
Expand Up @@ -300,7 +300,7 @@ int zfs_free_leak_on_eio = B_FALSE;
unsigned long zfs_deadman_synctime_ms = 600000ULL;

/*
* This value controls the maximum amount of time zio_wait() will block for
* This value controls the maximum amount of time zio_wait() will block for an
* outstanding IO. By default this is 300 seconds at which point the "hung"
* behavior will be applied as described for zfs_deadman_synctime_ms.
*/
Expand Down Expand Up @@ -551,7 +551,7 @@ spa_deadman(void *arg)

zfs_dbgmsg("slow spa_sync: started %llu seconds ago, calls %llu",
(gethrtime() - spa->spa_sync_starttime) / NANOSEC,
++spa->spa_deadman_calls);
spa->spa_deadman_calls + 1);
if (zfs_deadman_enabled)
vdev_deadman(spa->spa_root_vdev);

Expand Down Expand Up @@ -608,9 +608,7 @@ spa_add(const char *name, nvlist_t *config, const char *altroot)

spa->spa_deadman_synctime = MSEC2NSEC(zfs_deadman_synctime_ms);
spa->spa_deadman_ziotime = MSEC2NSEC(zfs_deadman_ziotime_ms);

if (spa_set_deadman_failmode(spa, zfs_deadman_failmode) != 0)
spa->spa_deadman_failmode = ZIO_FAILURE_MODE_WAIT;
spa_set_deadman_failmode(spa, zfs_deadman_failmode);

refcount_create(&spa->spa_refcount);
spa_config_lock_init(spa);
Expand Down Expand Up @@ -1803,21 +1801,17 @@ spa_get_deadman_failmode(spa_t *spa)
return (spa->spa_deadman_failmode);
}

int
void
spa_set_deadman_failmode(spa_t *spa, const char *failmode)
{
int error = 0;

if (strcmp(failmode, "wait") == 0)
spa->spa_deadman_failmode = ZIO_FAILURE_MODE_WAIT;
else if (strcmp(failmode, "continue") == 0)
spa->spa_deadman_failmode = ZIO_FAILURE_MODE_CONTINUE;
else if (strcmp(failmode, "panic") == 0)
spa->spa_deadman_failmode = ZIO_FAILURE_MODE_PANIC;
else
error = SET_ERROR(EINVAL);

return (error);
spa->spa_deadman_failmode = ZIO_FAILURE_MODE_WAIT;
}

uint64_t
Expand Down Expand Up @@ -2171,9 +2165,13 @@ param_set_deadman_failmode(const char *val, zfs_kernel_param_t *kp)
if ((p = strchr(val, '\n')) != NULL)
*p = '\0';

if (strcmp(val, "wait") != 0 && strcmp(val, "continue") != 0 &&
strcmp(val, "panic"))
return (SET_ERROR(-EINVAL));

mutex_enter(&spa_namespace_lock);
while ((spa = spa_next(spa)) != NULL)
(void) spa_set_deadman_failmode(spa, val);
spa_set_deadman_failmode(spa, val);
mutex_exit(&spa_namespace_lock);

return (param_set_charp(val, kp));
Expand Down
15 changes: 10 additions & 5 deletions module/zfs/zio.c
Original file line number Diff line number Diff line change
Expand Up @@ -1778,8 +1778,10 @@ zio_deadman_impl(zio_t *pio)
zfs_ereport_post(FM_EREPORT_ZFS_DEADMAN,
pio->io_spa, vd, zb, pio, 0, 0);

if (failmode == ZIO_FAILURE_MODE_CONTINUE)
if (failmode == ZIO_FAILURE_MODE_CONTINUE &&
taskq_empty_ent(&pio->io_tqent)) {
zio_interrupt(pio);
}
}

mutex_enter(&pio->io_lock);
Expand All @@ -1803,6 +1805,7 @@ zio_deadman(zio_t *pio)
if (!zfs_deadman_enabled || spa_suspended(spa))
return;

spa->spa_deadman_calls++;
zio_deadman_impl(pio);

switch (spa_get_deadman_failmode(spa)) {
Expand Down Expand Up @@ -1965,10 +1968,12 @@ zio_wait(zio_t *zio)
error = cv_timedwait_io(&zio->io_cv, &zio->io_lock,
ddi_get_lbolt() + MSEC_TO_TICK(zfs_deadman_checktime_ms));

if (error == -1) {
uint64_t delta = gethrtime() - zio->io_queued_timestamp;
if (delta > spa_deadman_ziotime(zio->io_spa))
zio_deadman(zio);
if (zfs_deadman_enabled && error == -1 &&
gethrtime() - zio->io_queued_timestamp >
spa_deadman_ziotime(zio->io_spa)) {
mutex_exit(&zio->io_lock);
zio_deadman(zio);
mutex_enter(&zio->io_lock);
}
}
mutex_exit(&zio->io_lock);
Expand Down

0 comments on commit c8b3299

Please sign in to comment.