-
Notifications
You must be signed in to change notification settings - Fork 880
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix archival activities error handling #3227
Conversation
if !common.IsPersistenceTransientError(err) { | ||
|
||
if _, ok := err.(*serviceerror.WorkflowNotReady); !ok { | ||
logger := tagLoggerWithHistoryRequest(tagLoggerWithActivityInfo(container.Logger, activity.GetInfo(ctx)), &request) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suppress logs for notReady error which is expected for deleteWorkflowExecution call. The error will still be logged if all retry fails.
@@ -167,7 +167,7 @@ func (h *handler) handleHistoryRequest(ctx workflow.Context, request *ArchiveReq | |||
localActCtx := workflow.WithLocalActivityOptions(ctx, lao) | |||
err = workflow.ExecuteLocalActivity(localActCtx, deleteHistoryActivity, *request).Get(localActCtx, nil) | |||
if err != nil { | |||
logger.Error("deleting history failed, this means zombie histories are left", tag.Error(err)) | |||
logger.Error("deleting workflow execution failed all retires, skip workflow deletion", tag.Error(err)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will the workflow data stay in Db if it just skip the deletion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Archival workflow will only retry up to 5min for both uploading and deletion and then give up. This is the limitation of the existing archival design. The issue will be gone once we have a separate archival queue.
For now, user should monitor the metrics for archival delete non-retryable error and use admin wf del command to manually delete those workflows from DB.
} else { | ||
err = temporal.NewApplicationError(err.Error(), "", nil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is default error
to ApplicationError
conversion. I would just return err
here.
} else { | |
err = temporal.NewApplicationError(err.Error(), "", nil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And everywhere bellow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. Then no change is needed to err
and now the defer looks like the following:
defer func() {
sw.Stop()
if err == errUploadNonRetryable {
scope.IncCounter(metrics.ArchiverNonRetryableErrorCount)
}
}()
97bbac9
to
706181f
Compare
What changed?
Why?
How did you test it?
Potential risks
Is hotfix candidate?