Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed data integrity issue when underlying disk returns error to zfs #12443

Merged
merged 12 commits into from
Sep 13, 2021
26 changes: 25 additions & 1 deletion module/zfs/zil.c
Original file line number Diff line number Diff line change
Expand Up @@ -1179,7 +1179,21 @@ zil_lwb_flush_vdevs_done(zio_t *zio)
ASSERT3P(zcw->zcw_lwb, ==, lwb);
zcw->zcw_lwb = NULL;

zcw->zcw_zio_error = zio->io_error;
/*
* Overwrite zcw_zio_error only if there is an error
* in flush, otherwise propagate the zcw_zio_error
* that is already set during the zil_lwb_write_done.
* Refer https://github.com/openzfs/zfs/issues/12391
* for more details.
*/
if (zio->io_error != 0) {
/*
* If the flush has failed, then the write must have
* been successful. VERIFY the same.
*/
VERIFY(zcw->zcw_zio_error == 0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets use VERIFY3S

zcw->zcw_zio_error = zio->io_error;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be a comment here explaining why we need to do this check.
Also, we should VERIFY that zcw->zcw_zio_error == 0 before overwriting it with zio->io_error.
IIUC, we can assert that because, IIUC, we don't issue the flush if the write fails.
We should actually VERIFY because
a) it's not on a hot code path and
b) it's critical for correctness.

The comment should address the assertion as well.

}

ASSERT3B(zcw->zcw_done, ==, B_FALSE);
zcw->zcw_done = B_TRUE;
Expand Down Expand Up @@ -1253,6 +1267,16 @@ zil_lwb_write_done(zio_t *zio)
* written out.
*/
if (zio->io_error != 0) {
zil_commit_waiter_t *zcw;
Copy link
Member

@prakashsurya prakashsurya Aug 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious.. this block of code you're adding, looks similar to the block of code in the "flush done" function.. i.e. starting at line 1173.. but I see some differences, such as the fact that this block you're adding doesn't call:

cv_broadcast(&zcw->zcw_cv);

nor does it set zcw_done..

are these differences intentional? it's been awhile since I've been in this code, so I'm just curious if we should be using the same exact logic in both cases, here and in the flush function?

in this error case, do we still call the "flush done" function? I presume not, which is why this change is needed.. but please correct me if I'm wrong.

for (zcw = list_head(&lwb->lwb_waiters); zcw != NULL;
zcw = list_next(&lwb->lwb_waiters, zcw)) {
mutex_enter(&zcw->zcw_lock);
ASSERT(list_link_active(&zcw->zcw_node));
ASSERT3P(zcw->zcw_lwb, ==, lwb);
zcw->zcw_zio_error = zio->io_error;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK this is the first place where we might set zcw_zio_error => if the zcw is zero-initialized, we could ASSERT3S(zcw->zcw_zio_error, ==, 0);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the zcw is zero-initialized

it is in zil_alloc_commit_waiter.. further, we already rely on this in the flush function (with these changes):

VERIFY(zcw->zcw_zio_error == 0);

so I think we could do this verification here too.

mutex_exit(&zcw->zcw_lock);
}

while ((zv = avl_destroy_nodes(t, &cookie)) != NULL)
kmem_free(zv, sizeof (*zv));
return;
Expand Down