-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Single processing output always return "value" #412
Comments
The output format returned from To use the |
@jerstlouis I think there is a good distinction between explicitly requesting Omitting |
@francescoingv @fmigneault @pvretano
That is by design! In the original question above, @francescoingv is asking about 7.11.4.2 and 7.11.4.3 which are talking specifically about synchronous execution. Regarding the original intent for 1.1/2.0 as we initially proposed, the client never gets a results.yaml response from a single output synchronous execution resonse. There is a single output, so the client gets that back directly. In synchronous execution, the client only gets back a results.yaml-like response when:
See the original summary proposal for 1.1/2.0:
I believe the draft is still following that roadmap, so:
No, because sync execution with single output never returns a results.yaml response.
The preference never changes whether results.yaml is used or not.
Only applies for multiple outputs, when negotiating
No, there is no way that the server returns a results.yaml for a single output in synchronous execution. Or the intermediate entity can make the request and store the result somewhere (e.g., object storage) and share that link. One reason behind this direct response approach is to greatly facilitate implementing synchronous execution without the need for an implementation to implement persistent storage at all (which can be provided by such an other entity instead). The implementation can execute the process, return the response, and forget about it completely as soon as it returned the response. (NOTE: the server doesn't have to follow this execute & forget approach. It can implement and remember This is possible by either only having single output processes, or by always embedding outputs directly in the case of multi outputs for the results.yaml response (always doing the This allows to easily delegate separately the "execution" vs. "storage of results" to different software components. In particular, with Processes - Part 3: Workflows "Collection Output" approach, the clients always make small / quick requests for a given Time/Area/Resolution of interest, and there is never a need to store results (though the server can cache results). Except if the process is not localizable, and everything really needs to be processed before anything can be returned at all. |
SWG meeting from 2024-05-27: We want to preserve that for sync exectution for a single output value, you will get that value directly. |
I do not disagree with the design of the response to return a single output if requested. However, I can see how that can lead to confusion by clients (hence this issue) when no specific output was requested/filtered, given that omitting Given that some processes return 1 output and others |
The requested outputs when omitting
The client can do that of course, but if it decides to omit |
Agreed, but having the response change content-type and structure entirely to represent "all outputs" whether that value is On the other hand, if the
The client could very well do the same to specify the single output to obtain directly if that is the intention. At least, the result would be consistent each time, and it would support all possible combinations of responses regardless of the |
You could potentially interpret it this way but that's not the way it must be interperted right now for 1.0, and changing the interpretation will break things. What makes it a package right now is whether there are more than 2 outputs requested, regardless of whether you omitted When you specify a single output explicitly, you still say |
If you define If another process using a similar definition just so happened to have a second output, The fact that parsing the response must behave differently depending on the number of outputs the process supports when only 1 is requested in each case is a bad design. From a client's point of view, this forces unnecessary if/else conditions and distinct code path to handle each case, when everything could be handled with the same logic if the specific output needed was requested explicitly each time, or that omitting an Another edge case is if a step process in a workflow gets updated later on from one output to two. Using the explicit output to get the value, the workflow would not need to update at all and the operation would continue to work transparently. With 1.0, the workflow would break and would require an update to specify the output, something that could have been avoided from the start when building the workflow, since you had to pick your output anyway when designing the workflow chain.
This is the whole point about this issue. When there are I do not think leaving this behavior as is because that how it was done in 1.0 is enough of a justification. |
I'm confused. In this case, the client is still asking for a single output, the response is still a single output, so the response works the same way as in the first case? Did you mean to talk about if the client omits the If the client always wants a single output, then it should use
That's a valid concern, but it is also easily avoided by the client always explicitly specifying the single output. I think it is also not specific to outputs, but also to the required inputs, and could generally be addressed by the concept of WellKnownProcesses, where process descriptions used in workflows step should not change in a breaking way after some well known process has been standardized.
Note that the results.yaml JSON should ideally not be mandatory (see #415, at least when large binary outputs are produced that need to be either linked to or base64-encoded), so that the server has the option to return a 406 Not Acceptable, and only supporting e.g. returning a zip file instead for multiple outputs.
There is a reason: not forcing sync-only server implementations to implement persistent storage of results, sticking to execute, return, forget. This allows for example, for an OGC API - Processes async implementation supporting job queuing/monitoring/persistent storage being implemented on top of a simpler OGC API - Processes sync-only implementation (where both implementation may be internal and the OGC API - Processes sync execution is the internal IPC mechanism between the two software components). |
I would like to consider the following use-case: Questions:
Possible reasons for this use-case:
If the use-case for the process included While a possible solution could be the process having defined two outputs: "file_content", "file_URL" |
Even if there is a single output, the client could prefer receiving it within the same JSON
Exactly. This is why omitting the
This does not consider Part 2 which allows replacing/updating a process in place. Even if the process endpoint was well known and defined, it could change later on. Submitting the same workflow payload for execution after that update would break suddenly. If the chain can resolve transparently by setting
I don't see how storage is involved here... or any issue about execute/return/forget.
The response structure is consistent regarding the specification of The |
That is not what I meant by a "WellKnownProcess". See opengeospatial/ogcapi-geodatacubes#5 .
As I mentioned in the other issue, the storage is not actually forced because you can always base64-encode the outputs as
That is correct,
Not forcing implementations to support returning by reference allows always returning the outputs by value to execute, return and forget in simple sync implementations.
Thanks for detailing the use case. The Collection / Dataset output defined in Part 3 might also often be applicable to such scenarios. This sets up the execution request, and provides an OGC API endpoint where to retrieve results (without actually triggering any processing yet -- the client actually interested in the data can request an area of interest which can be processed / cached as needed).
I'm not sure sync-only/single output is related to the value/links result. The reason things are this way is to allows the server as much flexibility as possible in how it decides to return things, because the implementation / deployment method might have some particular limitations, and because the servers knows its processing capabilities and the output better than the client. While it is true that the client has slightly less control over the response (like no more mode: raw | document or tranmissionMode: value, reference), there were so many options as illustrated in the tables, that this was really overcomplicated and very difficult to implement correctly in both client and server in 1.0. See Table 11 for 1.0 vs. Table 13 in 1.1/2.0; and Table 12 for 1.0 vs. Table 14 in 1.1/2.0. |
One reason in particular that changing the behavior about returning results.yaml or directly the one output is problematic is that the purpose of So if the change was made to say that omitting outputs altogether always returns a results.yaml, you would still not be able to request this way while selecting a particular output format. Requesting a single output, with 1.1/2.0 you can of course use content negotiation directly to do this instead of using outputs, like you can now also do from |
This is incorrect.
Nothing was forced. Process description were allowed to indicate whether it supported one or both
I disagree. This is not a solution. The point of
But too much flexibility makes servers not able to work together since they cannot even agree on how things are resolved between them and how to agree on specific behaviors.
Not necessarily. The example given by @francescoingv shows just that. The client might know that the resulting link is used to update a website, and nothing more. The server might have no knowledge at all about this intended use. If the process/server advertise that it can support both modes, it should honor it as best as possible when the client asks for it.
I'd argue otherwise as there was no ambiguity or implementation-specific interpretability before. If a server did not want to support one combination, it could simply indicate it as such by omitting the relevant options from its process description. If a server did not reply with the specific combination requested, we could quickly validate which one was misbehaving. Right now, server/client have to do a lot of guess work, and this is very hard to implement when there is no explicit guidance how to handle all possible results. When sending a request, I currently can never be sure I will get what I asked for, so I have to try handling all possible side effects.
That is a good point. Relying on a distinct |
Right, I realized that after writing that and looking at the tables again, sorry. I had already edited my comment.
Hmmm. Could we perhaps use the minimal vs. representation here again to split the first row of Table 13? So that if you specify |
My issue is not about Also,
The client would benefit for receiving either the reference or an error code, as a simpler code to handle the difference is needed. While I understand the server returning a reference if a value is requested (e.g. if the returned value is "too big") |
In the past, the client application could be informed that a given process can only support reference (or value) by accessing the Nevertheless, this commit 90e70b2 removed it from the processSummary.yaml schema. |
The issue is about the ability of the next version 1.1/2.0 to accommodate the presented use case, where:
If I understand it correctly:
If the above is correct, then the only solution I see for the presented use case is:
In this case, if the server cannot return the requested output (the URL) for the given request, then it will produce an error code. |
You correctly understood the issue. The use case you want to achieve is completely supported with 1.0, but is not possible in the current revision. There is no way to "enforce" the value/reference format to return. The
|
For the record, our use cases (currently running over WPS) involve:
Sorry if this is too vague to inform the discussions here. |
That is a necessary compromise to allow for implementations that are not required to implement persistent storage. Note that in this case (and any case with Prefer: minimal/representation), it is not a matter of "big enough" or not. With The big enough discretion of the server is when a preference for minimal/representation is not expressed by the client. |
I have no problem with letting applications implement the simplest approach of only supporting direct return value, and the standard allowing for this flexibility. That will help early adoption with reduced implementation requirements. However, I do not believe fulfilling this justifies any compromise of making it harder for other modes to work efficiently as well. My issue is that, for applications that do offer this support of alternate link/storage mode and raw/document formats, When working simultaneously with servers that do support storage and others that don't, the request with Older definitions of
This is also a regression IMO. There is no reason to limit this if the client knows that some outputs would be better used as links for chaining elsewhere while others could be obtained directly to avoid a subsequent request to retrieve their stored contents.
This is not my understanding of the current specification. While |
SWG meeting from 2024-06-10: We will discuss this further at the TC session on June 19th. |
…se body parameters (relates to #376, #414, #701, opengeospatial/ogcapi-processes#412)
In the current Draft version of OGC API - Processes - Part 1: Core (1.1 or 2.0)
7.11.4.2. Response requesting a single processing output
, the server always returns the "value" (requirements 29 and 30)I think the returned output should
conform to the results.yaml schema
.Should the
return preference
considered,and could the server decide whether to return "value" or "reference" also for
single processing output
as spelled for
7.11.4.3. Response requesting multiple processing outputs
?The text was updated successfully, but these errors were encountered: