Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

INVALID_ARGUMENT thrown when trying to complete an activity thats already completed #2538

Closed
darewreck54 opened this issue Feb 24, 2022 · 7 comments · Fixed by #2561
Closed
Assignees

Comments

@darewreck54
Copy link

The exception below is thrown when trying to complete an activity.

based on this thread https://community.temporal.io/t/reasonable-values-for-task-pollers/2257/2?u=darewreck it states that this could be due to

due to activity timeouts. When an activity takes longer to complete than its StartToClose timeout the call to the service with the activity result is rejected with this message.

In my case, the argument gets thrown because the activity is already completed. Shouldn't another exception be thrown vs. INVALID_ARGUMENT

"io.grpc.StatusRuntimeException: INVALID_ARGUMENT: Cannot locate Activity ScheduleID
	at io.grpc.stub.ClientCalls.toStatusRuntimeException(ClientCalls.java:262)
	at io.grpc.stub.ClientCalls.getUnchecked(ClientCalls.java:243)
	at io.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:156)
	at io.temporal.api.workflowservice.v1.WorkflowServiceGrpc$WorkflowServiceBlockingStub.respondActivityTaskCompletedById(WorkflowServiceGrpc.java:2762)
	at io.temporal.internal.client.external.ManualActivityCompletionClientImpl.complete(ManualActivityCompletionClientImpl.java:135)
	... 85 common frames omitted
Wrapped by: io.temporal.client.ActivityCompletionFailureException: ActivityId=2be5aea6-d792-3e8d-b0ee-9117f6b246a8
	at io.temporal.internal.client.external.ManualActivityCompletionClientImpl.processException(ManualActivityCompletionClientImpl.java:294)
	at io.temporal.internal.client.external.ManualActivityCompletionClientImpl.complete(ManualActivityCompletionClientImpl.java:137)
	at io.temporal.internal.sync.ActivityCompletionClientImpl.complete(ActivityCompletionClientImpl.java:53)
	at jdk.internal.reflect.GeneratedMethodAccessor454.invoke(Unknown Source)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at io.temporal.internal.WorkflowThreadMarker.lambda$protectFromWorkflowThread$1(WorkflowThreadMarker.java:80)
	at com.sun.proxy.$Proxy112.complete(Unknown Source)
@Spikhalskiy
Copy link
Contributor

This is not a JavaSDK issue, this response code is defined by the server - transferring the issue into temporal/temporal repo.

@Spikhalskiy Spikhalskiy transferred this issue from temporalio/sdk-java Feb 24, 2022
@Spikhalskiy
Copy link
Contributor

Spikhalskiy commented Feb 24, 2022

It looks that NOT_FOUND or maybe FAILED_PRECONDITION is indeed a more correct code for this situation.

Also, Temporal Server may return a better message to help users understand what is the reason for the rejection.
The wording "cannot locate" should change, because it's surely in the workflow history for example. We either should say "cannot locate in the mutable state" or rephrase if we don't want to expose the implementation details and make it friendlier for users.

Maybe something like io.grpc.StatusRuntimeException: NOT_FOUND: Cannot find in-progress Activity Execution with the ScheduleID, it was never scheduled or already closed.

This way we don't disclose internal implementation details about the mutable state but provide enough information that the activity has to be in-progress at the moment.

@jbreiding
Copy link
Contributor

@Spikhalskiy This all makes sense. Changing the grpc code to FAILED_PRECONDITION or NOT_FOUND, would the SDK respond differently in either case?

NOT_FOUND seems to make the most sense, this piece of the suggested message seems off it was never scheduled or already closed.

Given the SDK has the information about the activity if the response code is NOT_FOUND the sdk could lookup the activities last state and provide the exact cause for the activity and schedule ID not being in mutable state?

@Spikhalskiy
Copy link
Contributor

Spikhalskiy commented Feb 28, 2022

@jbreiding

Changing the grpc code to FAILED_PRECONDITION or NOT_FOUND, would the SDK respond differently in either case?

No, SDK just has io.temporal.client.ActivityCompletionFailureException right now, which looks correct, SDK doesn't have much additional info other than the server error. But the underlying wrapped raw gRPC exception will make more sense for the users.

Given the SDK has the information about the activity if the response code is NOT_FOUND the sdk could lookup the activities last state and provide the exact cause for the activity and schedule ID not being in mutable state?

It's not really given. When we use ManualActivityCompletionClientImpl we are even outside of Temporal worker. The only thing that we have is basically an activityId and temporal API.
But even inside Temporal SDK, there is no way for SDK to figure out what happened here. Theoretically, we can try to query workflow history and try to analyze what happened with this activity, but it's a very heavy approach for a simple problem,
So it looks to me that the Server has to provide enough clues in the error for users to don't be lost.

@jbreiding
Copy link
Contributor

jbreiding commented Feb 28, 2022

I will update the error code and message.

The server during this api call only has the context for lookup in mutable state.

Perhaps instead of it was never scheduled or already closed. the error message could suggest to look through workflow execution history to determine activity state?

@Spikhalskiy
Copy link
Contributor

Spikhalskiy commented Feb 28, 2022

Perhaps instead of it was never scheduled or already closed. the error message could suggest to look through workflow execution history to determine activity state?

This is a safe and 100% correct message, yeah.

But I was just thinking that we may give users the 99% message, but that may be immediately actionable and clear for them without even looking at history. I believe that in the vast majority of cases these two reasons will be causing the absence of activity in the mutable state if things work correctly and there are no bugs or some operational concerns that brought the mutable state in an incorrect state. Just keep in mind that history is a little bit scary for a lot of users. And often we say to users, they may be using Temporal without even deeply understanding the concept of history.

@jbreiding
Copy link
Contributor

Makes sense, I was hoping to have a more directional error message.
Leading the user where to look for this state rather than just providing the possible causes.

@jbreiding jbreiding linked a pull request Mar 1, 2022 that will close this issue
jbreiding added a commit that referenced this issue Mar 2, 2022
* Update error code and message for missing activity task.
Completes #2538

* review comments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants