Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update mutable state consistency check logic #2747

Merged
merged 7 commits into from
Apr 26, 2022
Merged

Update mutable state consistency check logic #2747

merged 7 commits into from
Apr 26, 2022

Conversation

wxing1292
Copy link
Contributor

What changed?

  • Utilize shard clock or shard ownership API for consistency check
    • Workflow task:
      • Start
      • Completion
      • Failure
    • Activity task:
      • Start
      • Heartbeat
      • Completion
      • Failure
      • Cancellation
    • Child workflow:
      • Start
      • Completion
  • Utilize shard ownership API for consistency check
    • Delete workflow
    • Terminate workflow
    • RequestCancel workflow
    • SignalWorkflowExecution
    • RemoveSignalMutableState
    • ResetWorkflowExecution
    • RefreshWorkflowTasks

Why?
see #2743

How did you test it?
Updated tests

Potential risks
N/A

Is hotfix candidate?
N/A

@wxing1292 wxing1292 marked this pull request as ready for review April 23, 2022 00:06
@wxing1292 wxing1292 requested a review from a team as a code owner April 23, 2022 00:06
) error {
if !*shardOwnershipAsserted {
*shardOwnershipAsserted = true
return shardContext.AssertOwnership(ctx)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@samarabbas @paulnpdev @yiminc
added a new shard & persistence API to assert shard ownership

@wxing1292
Copy link
Contributor Author

Will add additional unit test once this approach is accepted

namespaceID string,
workflowID string,
) (string, error) {
shardOwnershipAsserted := false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the usage of this pass by ref param is not very clear. Could you put some comment explaining how this is used and why it need to be pass by ref

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: would it be better on readability to use sync.once?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sync once should be considered if multiple goroutine race to initialize something
the logic here is per goroutine: call this function at most once per request

service/history/consistency_checker.go Show resolved Hide resolved
service/history/consistency_checker.go Show resolved Hide resolved
service/history/consistency_checker.go Outdated Show resolved Hide resolved
Comment on lines -360 to -362
if currentWorkflowTaskRunning || scheduleID < msBuilder.GetNextEventID() {
break
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this logic replaced by the inline predicate at line 327?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

return nil
}

func BypassMutableStateConsistencyPredicate(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see this is used in most of places, including RespondActivityTaskCompleted where the action of updateWorkflowActionFunc still rely on scheduleID >= mutableState.GetNextEventID() to detect stale cache do we still need those? I think the intention is to rely on the vclock to detect the staleness in those cases?

Copy link
Contributor Author

@wxing1292 wxing1292 Apr 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see this is used in most of places, including RespondActivityTaskCompleted where the action of updateWorkflowActionFunc still rely on scheduleID >= mutableState.GetNextEventID() to detect stale cache do we still need those? I think the intention is to rely on the vclock to detect the staleness in those cases?

currently only respond workflow task completed will use a customized predicate, i will probably update other place after this huge PR

I think the intention is to rely on the vclock to detect the staleness in those cases?

yes, but there will be a transition period: forward / backward compatibility

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is one exception, respond activity by ID, this set of APIs will probably continue to use existing checks

* Utilize shard clock or shard ownership API for consistency check
  * Workflow task:
    * Start
    * Completion
    * Failure
  * Activity task:
    * Start
    * Heartbeat
    * Completion
    * Failure
    * Cancellation
  * Child workflow:
    * Start
    * Completion
* Utilize shard ownership API for consistency check
  * Delete workflow
  * Terminate workflow
  * RequestCancel workflow
  * SignalWorkflowExecution
  * RemoveSignalMutableState
  * ResetWorkflowExecution
  * RefreshWorkflowTasks
This reverts commit cf9e3c8.
@wxing1292 wxing1292 merged commit 8e3fb97 into temporalio:master Apr 26, 2022
@wxing1292 wxing1292 deleted the citadel-4-shard-ownership branch April 26, 2022 21:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants