-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ZVOL: sync=always is not being honored #12391
Comments
Hi @ahrens, Unless we are missing something, it looks like a severe issue. Given our limited understanding of the code base, we are unable to suggest a fix at this point. It will really help us if you could take a look at this and provide some insights. Happy to debug and provide you any further information. |
please verify that it happens without iSCSI - there are MANY layers here. are you using open-iscsi? what is your iscsid.conf ? what is the target config? what backend are you using? |
Hi @bghira, I have tested using local disk on the same VM (I'm using Oracle virtual box VM with SATA controller)
Got similar message like above when I offline the device, blk_update_request: I/O error, dev sdc, sector 5253216 op 0x1:(WRITE) flags 0x700 phys_seg 1 prio class 0 |
sounds like you are running with writeback cache? |
Hi @bghira, There is no writeback cache. |
There is a chance that zil_commit() returns without actually flushing the disk. See comments in zil_lwb_write_done() and zil_lwb_flush_defer(). |
@jxdking Yes, this issues happens when all the disk in the pool are inaccessible. My expectation was application will wait until I fix the disk error and run Ex: If someone using network attached devices in the zpool (without local slog device) and a network outage more than 1 min can cause data loss (my first experiment with iSCSI disk) ?? |
This seems somewhat dubious to me. Certainly, zil_lwb_write_done can fail, but does that result in the zil_commit waiter returning success? Looking at zil_commit_impl, I see To me, this implies that the error when flushing the ZIL blocks is caught and handled by escalated to a general sync operation. Offlining a ZIL device can definitely result in the data already written to ZIL being lost, but I don't think it will lose future writes. Also, it sounds like he has offlined his entire pool, so probably everything should fail and error out. |
@IsaacVaughn I have done some experiment by moving the below code from zil_lwb_flush_vdevs_done to zil_lwb_write_done
because the error was actually happening getting in zil_lwb_write_done zio, and here zcw->zcw_zio_error = zio->io_error; was set to actual errror (EIO) after this change I could see the zil_commit_impl waits for spa_sync to flush the data and spa_sync is waiting for the disk to be accessible(I manually did a disk offline). And once I bring the disk online (echo "running" > /sys/block/sdc/device/state) everything starts from where it stopped. Below is the stack trace after my change: [ 1088.105111] zvol D 0 13811 2 0x00004000 [ 1088.105400] txg_sync D 0 13988 2 0x00004000 |
@arun-kv Moving the code that propagates the Lines 1245 to 1257 in 296a4a3
There, we should propagate the error into the waiter but don't do it. Could you revert your change and then either
|
@problame Yes that early return is hitting, that's the reason I moved the code here to propagate error, below is the debug message, Yes as you said my fix is not solving the issue, that was just added for checking the behavior. I was thinking about a solution, which is to |
From my understanding,
You may try to remove or comment out following code from zil_lwb_write_done, to see if it fixes your issue. if (list_head(&lwb->lwb_waiters) == NULL && nlwb != NULL) {
zil_lwb_flush_defer(lwb, nlwb);
ASSERT(avl_is_empty(&lwb->lwb_vdev_tree));
return;
} |
No, it did not fix the issue after commenting out the code, below is the debug message i got, zil_lwb_flush_vdevs_done is not getting the error that zil_lwb_write_done is getting. [ 345.173734] zil_lwb_write_done zio: 000000001627a0c4 zio_type: 2 error: 6 io_pipeline_trace: 24641781
[ 345.173739] zil_lwb_flush_vdevs_done zio: 0000000035d5870f zio_type: 0 error: 0 io_pipeline_trace: 17301505
if (zio->io_error != 0) {
while ((zv = avl_destroy_nodes(t, &cookie)) != NULL)
kmem_free(zv, sizeof (*zv));
return;
}
if (list_head(&lwb->lwb_waiters) == NULL && nlwb != NULL) {
zil_lwb_flush_defer(lwb, nlwb);
ASSERT(avl_is_empty(&lwb->lwb_vdev_tree));
return;
}
|
@arun-kv lwb->lwb_write_zio = zio_rewrite(lwb->lwb_root_zio,
zilog->zl_spa, 0, &lwb->lwb_blk, lwb_abd,
BP_GET_LSIZE(&lwb->lwb_blk), zil_lwb_write_done, lwb,
prio, ZIO_FLAG_CANFAIL | ZIO_FLAG_DONT_PROPAGATE |
ZIO_FLAG_FASTWRITE, &zb); ZIO_FLAG_DONT_PROPAGATE prevents the error gets bubble up to parent zio. However, I am not sure whether changing the flag here is the right solution. It may break other logic. Another option is we can log zcw_zio_error in zil_lwb_write_done if there is any error. In zil_lwb_flush_vdevs_done, we only override zcw_zio_error when io_error is not 0. |
Thanks, both the solutions worked. diff --git a/module/zfs/zil.c b/module/zfs/zil.c
index 78d0711cc..1827df0bd 100644
--- a/module/zfs/zil.c
+++ b/module/zfs/zil.c
@@ -1179,7 +1179,8 @@ zil_lwb_flush_vdevs_done(zio_t *zio)
ASSERT3P(zcw->zcw_lwb, ==, lwb);
zcw->zcw_lwb = NULL;
- zcw->zcw_zio_error = zio->io_error;
+ if (zio->io_error != 0)
+ zcw->zcw_zio_error = zio->io_error;
ASSERT3B(zcw->zcw_done, ==, B_FALSE);
zcw->zcw_done = B_TRUE;
@@ -1253,6 +1254,16 @@ zil_lwb_write_done(zio_t *zio)
* written out.
*/
if (zio->io_error != 0) {
+ zil_commit_waiter_t *zcw;
+ for (zcw = list_head(&lwb->lwb_waiters); zcw != NULL;
+ zcw = list_next(&lwb->lwb_waiters, zcw)) {
+ mutex_enter(&zcw->zcw_lock);
+ ASSERT(list_link_active(&zcw->zcw_node));
+ ASSERT3P(zcw->zcw_lwb, ==, lwb);
+ zcw->zcw_zio_error = zio->io_error;
+ mutex_exit(&zcw->zcw_lock);
+ }
+
while ((zv = avl_destroy_nodes(t, &cookie)) != NULL)
kmem_free(zv, sizeof (*zv));
return; |
All the zio of the issued lwb are chained together. The original author used ZIO_FLAG_DONT_PROPAGATE explicitly. I guess there must be a reason. Why don't you create a pull request. It looks like a bug that original code does not catch io_error in zil_lwb_write_done |
Errors in zil_lwb_write_done() are not propagated to zil_lwb_flush_vdevs_done() which can result in zil_commit_impl() not returning an error to applications even when zfs was not able to write data to the disk. Remove the ZIO_FLAG_DONT_PROPAGATE flag from zio_rewrite() to allow errors to propagate and consolidate the error handling for flush and write errors to a single location (rather than having error handling split between the "write done" and "flush done" handlers). Reviewed-by: George Wilson <[email protected]> Reviewed-by: Prakash Surya <[email protected]> Signed-off-by: Arun KV <[email protected]> Closes openzfs#12391 Closes openzfs#12443
Errors in zil_lwb_write_done() are not propagated to zil_lwb_flush_vdevs_done() which can result in zil_commit_impl() not returning an error to applications even when zfs was not able to write data to the disk. Remove the ZIO_FLAG_DONT_PROPAGATE flag from zio_rewrite() to allow errors to propagate and consolidate the error handling for flush and write errors to a single location (rather than having error handling split between the "write done" and "flush done" handlers). Reviewed-by: George Wilson <[email protected]> Reviewed-by: Prakash Surya <[email protected]> Signed-off-by: Arun KV <[email protected]> Closes openzfs#12391 Closes openzfs#12443
System information
Describe the problem you're observing
zvol (sync=always): Write(O_SYNC) returns success when the iSCSI disk is not accessible.
The application gets a write success from zvol created with option sync=always even when the disk is not accessible.
Note: Only the first write after the disk becomes inaccessible is wrongly acknowledged as successfully written, after that all other writes from the application wait.
Describe how to reproduce the problem
until zfs prints " WARNING: Pool 'zpool' has encountered an uncorrectable I/O failure and has been suspended" this message in the syslog), and you will see now the write call returns success.
zpool clear
command.Include any warning/errors/backtraces from the system logs
Example Program
The text was updated successfully, but these errors were encountered: