-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixed data integrity issue when underlying disk returns error to zfs #12443
Changes from 3 commits
d3e1def
5f72b15
50cf019
6256570
35a74a6
4a45938
25de752
5bbbcd0
1b70a86
bad4ca3
05bf1fd
5183aab
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1179,7 +1179,21 @@ zil_lwb_flush_vdevs_done(zio_t *zio) | |
ASSERT3P(zcw->zcw_lwb, ==, lwb); | ||
zcw->zcw_lwb = NULL; | ||
|
||
zcw->zcw_zio_error = zio->io_error; | ||
/* | ||
* Overwrite zcw_zio_error only if there is an error | ||
* in flush, otherwise propagate the zcw_zio_error | ||
* that is already set during the zil_lwb_write_done. | ||
* Refer https://github.com/openzfs/zfs/issues/12391 | ||
* for more details. | ||
*/ | ||
if (zio->io_error != 0) { | ||
/* | ||
* If the flush has failed, then the write must have | ||
* been successful. VERIFY the same. | ||
*/ | ||
VERIFY(zcw->zcw_zio_error == 0); | ||
zcw->zcw_zio_error = zio->io_error; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There should be a comment here explaining why we need to do this check. The comment should address the assertion as well. |
||
} | ||
|
||
ASSERT3B(zcw->zcw_done, ==, B_FALSE); | ||
zcw->zcw_done = B_TRUE; | ||
|
@@ -1253,6 +1267,16 @@ zil_lwb_write_done(zio_t *zio) | |
* written out. | ||
*/ | ||
if (zio->io_error != 0) { | ||
zil_commit_waiter_t *zcw; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm curious.. this block of code you're adding, looks similar to the block of code in the "flush done" function.. i.e. starting at line 1173.. but I see some differences, such as the fact that this block you're adding doesn't call:
nor does it set are these differences intentional? it's been awhile since I've been in this code, so I'm just curious if we should be using the same exact logic in both cases, here and in the flush function? in this error case, do we still call the "flush done" function? I presume not, which is why this change is needed.. but please correct me if I'm wrong. |
||
for (zcw = list_head(&lwb->lwb_waiters); zcw != NULL; | ||
zcw = list_next(&lwb->lwb_waiters, zcw)) { | ||
mutex_enter(&zcw->zcw_lock); | ||
ASSERT(list_link_active(&zcw->zcw_node)); | ||
ASSERT3P(zcw->zcw_lwb, ==, lwb); | ||
zcw->zcw_zio_error = zio->io_error; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. AFAIK this is the first place where we might set There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
it is in
so I think we could do this verification here too. |
||
mutex_exit(&zcw->zcw_lock); | ||
} | ||
|
||
while ((zv = avl_destroy_nodes(t, &cookie)) != NULL) | ||
kmem_free(zv, sizeof (*zv)); | ||
return; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lets use
VERIFY3S