-
Notifications
You must be signed in to change notification settings - Fork 59
feat(split): replica server handle pause and cancel status #681
Conversation
"wrong partition_status({})", | ||
enum_to_string(status())); | ||
dassert_replica(_split_status == split_status::SPLITTING || | ||
_split_status == split_status::NOT_SPLIT, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why allow stop under NOT_SPLIT
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And, if we send cancel
or pause
by mistake, dassert will cause crash? is your expect?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When learn happened, parent partition may stop split, set it _split_status as NOT_SPLIT
, meta server won't know it, so it is possible when a pause or cancel split request sync to parent partition, its split_status is NOT_SPLIT.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And, if we send cancel or pause by mistake, dassert will cause crash? is your expect?
Pause or cancel split request will send to meta server, meta server should check if this table is splitting, you can reference pr679. Besides, parent partition will not set split_status as PAUSING and CANCELING, when it receives pause or cancel request, it will set it NOT_SPLIT. I don't know if I explain it clearly, you can comment to me if you have any questions.
return; | ||
} | ||
|
||
if (!resp->__isset.is_split_stopped || !resp->is_split_stopped) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
__isset.is_split_stopped
and is_split_stopped
defference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
__isset.is_split_stopped = false
means this group_check_response doesn't have any information about pause or cancel split including normal group_check or splitting group_check, when it is true, meaning this group_check include pause or cancel split information. resp->is_split_stopped = true
means secondary parent partition pause or cancel split succeed.
|
||
_replica->_primary_states.split_stopped_secondary.insert(req->node); | ||
auto count = 0; | ||
for (auto &iter : _replica->_primary_states.statuses) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can count
be stored as global
variable, but not repeat compute when check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's safety to check it each time, because _primary_states.statuses
may change during learn, and this case is not always happened, it's okay to check it.
When meta server pause or cancel split, partition's split_status will be
pausing
orcanceling
(#679), and this split_status will transfer replica server through on_config_sync(#653). This pr implements how replica server handle those two split_status.When primary parent partition receives
pausing
orcanceling
split_status from meta server, it will set split_status intonot_split
and broadcast it throughgroup_check
, secondary parent partition also set its split_status intonot_split
and setis_split_stopped = true
ingroup_check_response
. Primary parent partition will check if all partitions in its group have already paused or canceled split, if all stop succeed, it will send notification to meta, which will be implmented in next pull request.