Skip to content

Commit

Permalink
Fixed data integrity issue when underlying disk returns error to zfs
Browse files Browse the repository at this point in the history
zil_lwb_write_done error was not propagated to
zil_lwb_flush_vdevs_done, due to which zil_commit_impl was
returning and application gets write success even though zfs was
not able to write data to the disk.
  • Loading branch information
arun-kv committed Jul 29, 2021
1 parent 7eebcd2 commit d781ef5
Showing 1 changed file with 19 additions and 1 deletion.
20 changes: 19 additions & 1 deletion module/zfs/zil.c
Original file line number Diff line number Diff line change
Expand Up @@ -1179,7 +1179,8 @@ zil_lwb_flush_vdevs_done(zio_t *zio)
ASSERT3P(zcw->zcw_lwb, ==, lwb);
zcw->zcw_lwb = NULL;

zcw->zcw_zio_error = zio->io_error;
if (zio->io_error != 0)
zcw->zcw_zio_error = zio->io_error;

ASSERT3B(zcw->zcw_done, ==, B_FALSE);
zcw->zcw_done = B_TRUE;
Expand Down Expand Up @@ -1253,6 +1254,23 @@ zil_lwb_write_done(zio_t *zio)
* written out.
*/
if (zio->io_error != 0) {
/*
* Copy the write error to zcw, becaues the zil_lwb_write_done
* error is not propagated to zil_lwb_flush_vdevs_done, which will
* cause zil_commit_impl to return without committing the data.
* Refer https://github.com/openzfs/zfs/issues/12391
* for more details.
*/
zil_commit_waiter_t *zcw;
for (zcw = list_head(&lwb->lwb_waiters); zcw != NULL;
zcw = list_next(&lwb->lwb_waiters, zcw)) {
mutex_enter(&zcw->zcw_lock);
ASSERT(list_link_active(&zcw->zcw_node));
ASSERT3P(zcw->zcw_lwb, ==, lwb);
zcw->zcw_zio_error = zio->io_error;
mutex_exit(&zcw->zcw_lock);
}

while ((zv = avl_destroy_nodes(t, &cookie)) != NULL)
kmem_free(zv, sizeof (*zv));
return;
Expand Down

0 comments on commit d781ef5

Please sign in to comment.