Support for "workflow chain IDs" in all APIs that are able to address the "latest" run #2691

macrogreg · 2022-03-31T17:21:47Z

We discussed this over slack. Adding an Issue here so that we can keep track. While this is not (yet) time-critical, I would like to design the .NET SDK under the assumption that this gets eventually implemented before we release production-ready versions of the SDK.

Below, I copy a slightly edited version of the slack conversation for context and records.

x x x x x x x x x x x x

Hey folks, I am trying to understand the feasibility of the following:

Consider the public server APIs that operate on a particular Workflow Run. E.g. QueryWorkflow, TerminateWorkflowExecution, and many others. These APIs tend to in-take WorkflowExecution, which is a tuple of (workflow_id, run_id). Also, for most (all?) such APIs the run_id may be omitted. In such cases the invocation will apply to the most recent run that carries the specified workflow_id.
This is the situation today. Please correct me if I am wrong. 😃

Now two questions (first one may have been asked before).

(1)
Could those APIs be extended such that instead of specifying the run_id, the user could specify the chain_start_run_id (meaning run_id of the first (i.e. the oldest) run in the execution chain)? Then the API would apply to the most recent run (i.e. the newest) in the chain specified by the chain_start_run_id. If the chain finishes at some point and a new chain with the same workflow id is started, then invocations where (run_id, chain_start_run_id) is specified would not "flow" . They would continue to refer to the finished chain.

The purpose of this is hopefully clear: A chain represents a workflow with one or more runs (caused by retries, continue-as-new continuations, ...). Once such a chain finishes, the workflow logically concludes. A new chain is a completely new workflow (with the same workflow id). Typically, a user who interacts with a specific workflow does not want to switch to interacting with a new workflow without noticing.

I assume that the answer to this part of the question is Yes.
There is even a corresponding PR for the API.

( There, chain_start_run_id is called first_execution_run_id, but the name does not matter at this stage. For the current discussion just the concept, not the term is critical, so I'll temporarily stick to chain_start_run_id for brevity/clarity. In fact, I would love to coin the term "workflow chain id", as it is such an important concept that it deserves its own name. But, again, this terminology is not in scope here. 😃 ).

Either way, that PR does not really solve the issue completely. The SDKs need to not only be able to supply the chain_start_run_id, but also they need to know it. Thus:

(2)
Now the second (related) question: Can all those APIs be extended in a way so that their return payload includes the chain_start_run_id of the chain that contains the run to which the call in fact applied?

For example:

A user calls SignalWorkflowExecution(workflowId="W1", runId=null, ...).

This means "send a signal to the latest (=most recent) run with the workflow id "W1"".
Now, the server will determine the latest run with that workflow id and deliver the signal to that particular run. (Lets assume that the run_id of that run was "R42".)

After that, the user probably wants to continue interacting with "the workflow" that was affected by that signal. On a technical level "the workflow" is a chain-of-workflow-runs. They likely want to continue interacting with the "latest" run only as long as the "latest" run is still a part of the same chain as the run "R42" was. If the chain finishes and a new run is started with the same workflow id, that run is no longer part of the same logical workflow. Then the user likely does not want to interact with that run in the same session.

How can we enable the user to avoid unwillingly "overflowing" beyond the end of the workflow chain?

We ensure that SignalWorkflowExecution(..) includes the chain_start_run_id into its return payload. Then, after staring the set of interactions as described above, the client knows the chain_start_run_id of chain that contained the run with run_id="R42". (Let's assume that chain_start_run_id was "R18"). So, all subsequent invocations would include that information.
E.g., to send another signal, the user invokes
SignalWorkflowExecution(workflowId="W1", chain_start_run_id="R18", runId=null, ...)

which means "send signal to the latest run in the chain with chain_start_run_id="R18"".

If the user wanted to address the actually latest run, without restricting the call to the same workflow chain as they interacted previously, they would simply no longer include the chain_start_run_id.

Problem solved. :)

This may sound a little complicated, but I believe once you think it though, it appears quite straight forward. And, of course, we do not actually expect users to deal with the complexity. Language-SDKs will store the chain_start_run_id into whatever object they use to refer to a workflow and to invoke APIs on it (e.g. to send a signal to a workflow).

So: my question to the server team is: how hard / feasible is it to extent the APIs in the manner described? It is something we can reasonably tackle?

Thank you!

x x x x x x x x x x x x

Below is a minimally edited record of the Slack conversation about this topic between a few people.

The text was updated successfully, but these errors were encountered:

macrogreg · 2022-03-31T17:22:23Z

Response from Slack:

This should be pretty straight forward to do. Server already keep track of the chain_start_run_id as FirstExecutionRunId in mutable state.

macrogreg · 2022-03-31T17:23:59Z

Response from Slack:

I agree that this might not be hard to do but I don’t think it’s the semantic we want to promulgate. I realize that it’s possible to use the existing APIs in this fashion, but (with the acknowledgement that I’m not the most experienced user from the client side), I think this pattern promotes complexity instead of discouraging it. Basically, I think that the correct model is for the user to change the workflow ID at the end of a chain. So each Workflow is exactly one chain (in the parlance of macrogreg’s question).

Broadly speaking I think that we would be better served by directing our users to compose simpler semantic constructs to achieve complexity, and only adding new semantics like this “chain_start_run_id” when the win is huge. If the user follows the pattern that I’m proposing here, then he gets exactly the semantic macrogreg asked for using our existing API.

Please tell me if I am missing something.

macrogreg · 2022-03-31T17:27:25Z

Response from Slack:

I completely agree that it would be conceptually much simpler if we simply disallowed to have multiple workflow chains with the same workflow id.

The problem discussed in this thread would not exist.

Moreover, we would not need to explain the confusing part about the workflow-id uniquely describing a running chain, but not really uniquely describing a chain in general, because there may be multiple chains with the same workflow id as long as only one of them is running.

But, at some point, for some reason (probably a good one, but either way, it's a done deal now) we decided that we wanted to support a workflow-id-reuse-policy that allowed creating new workflow chains with an id that was used previously.

One of our central promises is that Temporal-base software is easy-to-use and robust. I am not sure how frequently the workflow-id-reuse-policy is used in practice. But in combination with a very powerful feature of being able to interact with a workflow that finished in the past, (query, result, ...), the workflow-id-reuse-policy does break the clean and simple "workflow id can be used to address a workflow" assumption.

The desired outcome of the strategy described in this thread is to solve a problem that, when it occurs, is hard to understand and diagnose for our users. Something that we strive to avoid for our users.

And, I think, the proposed solution is conceptually clean and simple with the help of an SDK that supports it. But perhaps there are other approaches to achieve this outcome that can work as well?

So, overall, I agree. It is easier and cleaner to discourage reusing workflow-ids. But if we believe that that feature has value and is not a historical mistake that we wish we did not make, then we need to have some way of protecting people from the caveat of the "chain overflow" described in this thread. 😃

macrogreg · 2022-03-31T17:28:14Z

Response from Slack:

the configured default already disallows workflow ID reuse. Personally, I think it would be sufficient to provide a warning about this issue on the page where we tell users how to change the default. There’s only so much you can go to prevent someone from shooting themselves in the foot.

macrogreg · 2022-03-31T17:29:16Z

Response from Slack:

the configured default already disallows workflow ID reuse

this is incorrect. The default ID reuse policy is "allow duplicate"

macrogreg · 2022-03-31T17:30:40Z

Response from Slack:

There’s only so much you can go to prevent someone from shooting themselves in the foot.

The "safety" proposed is, essentially, transparent / for free to the SDK users. What would be the drawback of having it?

macrogreg · 2022-03-31T17:31:14Z

Response from slack:

I agree that if we disallow duplicate workflows with same ID, that would be much simpler. But reality is we allow it by the ID reuse policy. And there are many use cases that needed that feature. We cannot take it away. With that, I think the proposed solution is reasonable.

macrogreg · 2022-03-31T17:31:50Z

Response from Slack:

Fun fact, we are not taking away ID reuse policy, but we are adding more option to it: #2608 😄

macrogreg · 2022-03-31T17:32:18Z

Response from Slack:

Ability to reuse ID is very important for many user facing scenarios.

macrogreg added enhancement New feature or request API Issues/features involving the API labels Mar 31, 2022

sync-by-unito bot closed this as completed Mar 3, 2023

yiminc reopened this Mar 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for "workflow chain IDs" in all APIs that are able to address the "latest" run #2691

Support for "workflow chain IDs" in all APIs that are able to address the "latest" run #2691

macrogreg commented Mar 31, 2022 •

edited

Loading

macrogreg commented Mar 31, 2022 •

edited

Loading

macrogreg commented Mar 31, 2022

macrogreg commented Mar 31, 2022

macrogreg commented Mar 31, 2022

macrogreg commented Mar 31, 2022

macrogreg commented Mar 31, 2022

macrogreg commented Mar 31, 2022

macrogreg commented Mar 31, 2022

macrogreg commented Mar 31, 2022

Support for "workflow chain IDs" in all APIs that are able to address the "latest" run #2691

Support for "workflow chain IDs" in all APIs that are able to address the "latest" run #2691

Comments

macrogreg commented Mar 31, 2022 • edited Loading

macrogreg commented Mar 31, 2022 • edited Loading

macrogreg commented Mar 31, 2022

macrogreg commented Mar 31, 2022

macrogreg commented Mar 31, 2022

macrogreg commented Mar 31, 2022

macrogreg commented Mar 31, 2022

macrogreg commented Mar 31, 2022

macrogreg commented Mar 31, 2022

macrogreg commented Mar 31, 2022

macrogreg commented Mar 31, 2022 •

edited

Loading

macrogreg commented Mar 31, 2022 •

edited

Loading