Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed data integrity issue when underlying disk returns error to zfs #12443

Merged
merged 12 commits into from
Sep 13, 2021
21 changes: 20 additions & 1 deletion module/zfs/zil.c
Original file line number Diff line number Diff line change
Expand Up @@ -1179,7 +1179,8 @@ zil_lwb_flush_vdevs_done(zio_t *zio)
ASSERT3P(zcw->zcw_lwb, ==, lwb);
zcw->zcw_lwb = NULL;

zcw->zcw_zio_error = zio->io_error;
if (zio->io_error != 0)
zcw->zcw_zio_error = zio->io_error;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be a comment here explaining why we need to do this check.
Also, we should VERIFY that zcw->zcw_zio_error == 0 before overwriting it with zio->io_error.
IIUC, we can assert that because, IIUC, we don't issue the flush if the write fails.
We should actually VERIFY because
a) it's not on a hot code path and
b) it's critical for correctness.

The comment should address the assertion as well.


ASSERT3B(zcw->zcw_done, ==, B_FALSE);
zcw->zcw_done = B_TRUE;
Expand Down Expand Up @@ -1253,6 +1254,24 @@ zil_lwb_write_done(zio_t *zio)
* written out.
*/
if (zio->io_error != 0) {
/*
* Copy the write error to zcw, becaues the zil_lwb_write_done
* error is not propagated to zil_lwb_flush_vdevs_done, which
* will cause zil_commit_impl to return without committing
* the data.
* Refer https://github.com/openzfs/zfs/issues/12391
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. typo in becaues
  2. I find the comment here more confusing than useful. I'd prefer a comment in zil_commit_impl in the place where we check zcw_zio_error that explains the entire error propagation path (for both the write and flush done callbacks)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@problame thank you for the comments. I have updated the PR.

* for more details.
*/
zil_commit_waiter_t *zcw;
Copy link
Member

@prakashsurya prakashsurya Aug 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious.. this block of code you're adding, looks similar to the block of code in the "flush done" function.. i.e. starting at line 1173.. but I see some differences, such as the fact that this block you're adding doesn't call:

cv_broadcast(&zcw->zcw_cv);

nor does it set zcw_done..

are these differences intentional? it's been awhile since I've been in this code, so I'm just curious if we should be using the same exact logic in both cases, here and in the flush function?

in this error case, do we still call the "flush done" function? I presume not, which is why this change is needed.. but please correct me if I'm wrong.

for (zcw = list_head(&lwb->lwb_waiters); zcw != NULL;
zcw = list_next(&lwb->lwb_waiters, zcw)) {
mutex_enter(&zcw->zcw_lock);
ASSERT(list_link_active(&zcw->zcw_node));
ASSERT3P(zcw->zcw_lwb, ==, lwb);
zcw->zcw_zio_error = zio->io_error;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK this is the first place where we might set zcw_zio_error => if the zcw is zero-initialized, we could ASSERT3S(zcw->zcw_zio_error, ==, 0);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the zcw is zero-initialized

it is in zil_alloc_commit_waiter.. further, we already rely on this in the flush function (with these changes):

VERIFY(zcw->zcw_zio_error == 0);

so I think we could do this verification here too.

mutex_exit(&zcw->zcw_lock);
}

while ((zv = avl_destroy_nodes(t, &cookie)) != NULL)
kmem_free(zv, sizeof (*zv));
return;
Expand Down