-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[TA3151]feat(snap_rebuild)Flush outstanding IOs before taking snapshot on rebuild clone. #123
[TA3151]feat(snap_rebuild)Flush outstanding IOs before taking snapshot on rebuild clone. #123
Conversation
Signed-off-by: satbir <[email protected]>
lib/libzrepl/data_conn.c
Outdated
#endif | ||
while (1) { | ||
if (!zinfo->quiesce_done) | ||
sleep(1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a doubt. If replica gets disconnected from the target during rebuilding and quiescing was requested then will this thread exit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
current zfs takes care that - if either mgmt or data conn breaks, any rebuilding related threads also will be exited.
With current quiescing code also, I don't think there will be any issue, except that, it may be delayed to break rebuild threads.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I is fixed now.
Signed-off-by: satbir <[email protected]>
lib/libzrepl/data_conn.c
Outdated
|
||
/* IO to clone should be sent only when it is from app */ | ||
if (!is_rebuild && (zinfo->clone_zv != NULL)) { | ||
if (!is_rebuild && !ZVOL_IS_HEALTHY(zinfo->main_zv) && | ||
(zinfo->clone_zv != NULL)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can remove check for clone_zv
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
lib/libzrepl/data_conn.c
Outdated
@@ -334,6 +341,12 @@ uzfs_zvol_worker(void *arg) | |||
} | |||
} | |||
|
|||
/* App IOs should go to cloen_zv */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can make this as 'else' for above 'if' without NULL check for clone_zv
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
if (inject_error.delay.rebuid_io_quiesce_check_by_pass) | ||
continue; | ||
#endif | ||
while (1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this may need to be executed only in the thread where we had set these flags
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think after discussion this code have no side effect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
Signed-off-by: satbir <[email protected]>
if (!rebuild_cmd_req && | ||
!ZVOL_IS_HEALTHY(zinfo->main_zv) && | ||
(zinfo->clone_zv != NULL)) | ||
read_zv = zinfo->clone_zv; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should do the same for the sync also (ZVOL_OPCODE_SYNC) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done sir.
Signed-off-by: satbir <[email protected]>
Signed-off-by: satbir <[email protected]>
@@ -213,6 +213,19 @@ taskq_wait_outstanding(taskq_t *tq, taskqid_t id) | |||
taskq_wait(tq); | |||
} | |||
|
|||
int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment related to function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sir I think this is not latest code. ....I have renamed it to taskq_check_active_ios(). Please do let me know if you meant else
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just few lines of comment explaining the i/p and o/p of this function
include/zrepl_prot.h
Outdated
@@ -169,6 +169,7 @@ enum zvol_status { | |||
|
|||
typedef enum zvol_status zvol_status_t; | |||
|
|||
#define ZVOL_IS_HEALTHY(zv) (zv->zv_status == ZVOL_STATUS_HEALTHY) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lets us uzfs_zinfo_get_status(zinfo) getter function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are some more macros which are also not as per guidelines.
#define ZINFO_IS_DEGRADED(zinfo) (ZVOL_IS_DEGRADED(zinfo->main_zv))
#define ZVOL_IS_DEGRADED(zv) (zv->zv_status == ZVOL_STATUS_DEGRADED)
Let me fix them with next PR. This PR seems to be little heavy weight.
lib/libzrepl/data_conn.c
Outdated
@@ -348,6 +356,8 @@ uzfs_zvol_worker(void *arg) | |||
|
|||
case ZVOL_OPCODE_SYNC: | |||
uzfs_flush_data(zinfo->main_zv); | |||
if (zinfo->clone_zv) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lets check !ZVOL_IS_HEALTHY(zinfo->main_zv) here also instead of checking for zinfo->clone_zv).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
if (zinfo->quiesce_done || | ||
!taskq_check_active_ios( | ||
zinfo->uzfs_zvol_taskq)) { | ||
zinfo->quiesce_done = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to set zinfo->quiesce_done to one? if condition already make sure of it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
never mind, saw the || condition. looks ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
@@ -158,6 +158,11 @@ typedef struct zvol_info_s { | |||
|
|||
/* Will be used to singal ack-sender to exit */ | |||
uint8_t conn_closed; | |||
|
|||
/* Rebuild flags to quiesce IOs */ | |||
uint8_t quiesce_requested; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we use these two as bits in 'flags' variable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and create setter/getter macros around these variables
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok.. we can't set them as bits, as there is no lock in reading these variables
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
@@ -348,6 +356,8 @@ uzfs_zvol_worker(void *arg) | |||
|
|||
case ZVOL_OPCODE_SYNC: | |||
uzfs_flush_data(zinfo->main_zv); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also we should flush to main_zv if (ZVOL_IS_REBUILDING_AFS(zinfo->main_zv) || ZVOL_IS_HEALTHY(zinfo->main_zv).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
… TA3151 Signed-off-by: Satbir <[email protected]>
Codecov Report
@@ Coverage Diff @@
## zfs-0.7-release #123 +/- ##
===================================================
- Coverage 52.4% 51.96% -0.45%
===================================================
Files 241 241
Lines 77442 77437 -5
===================================================
- Hits 40587 40240 -347
- Misses 36855 37197 +342 Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changes looks good
This PR is to flush all outstanding IOs before create snapshot on rebuild clone. This way we are making sure that all IOs will be flushed to clone and we are not missing any IO.
Signed-off-by: satbir [email protected]