Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Single processing output always return "value" #412

Open
francescoingv opened this issue May 24, 2024 · 24 comments
Open

Single processing output always return "value" #412

francescoingv opened this issue May 24, 2024 · 24 comments
Assignees

Comments

@francescoingv
Copy link

In the current Draft version of OGC API - Processes - Part 1: Core (1.1 or 2.0)

  • 7.11.4.2. Response requesting a single processing output, the server always returns the "value" (requirements 29 and 30)

I think the returned output should conform to the results.yaml schema.

Should the return preference considered,
and could the server decide whether to return "value" or "reference" also for single processing output
as spelled for 7.11.4.3. Response requesting multiple processing outputs?

@fmigneault
Copy link
Contributor

The output format returned from /processes/{processID}/execution depends on the preference in terms of raw/document (from Prefer: return=minimal|representation), the negotiated content-type (Accept header), whether the server deems the content is small (at its own discretion) and sync/async (from Prefer: wait=X; respond-async).

To use the results.yaml on sync execution each time, I believe the solution would be to not request any outputs (i.e.: omit outputs in the execution body, as per Requirement 28 - /req/core/process-execute-default-outputs). That would result in "all" the outputs to be returned as JSON, even if "all" is only 1 output. How value/href would be embedded within the JSON response would depend on other mentioned preference/negotiation headers.

@fmigneault
Copy link
Contributor

@jerstlouis
Maybe that case needs to be better explained in the docs?

I think there is a good distinction between explicitly requesting outputs: {} leading to 204 No Contents (the last sync case in Table 13 of section 7.11.4.1), and omitting the outputs field entirely. The first basically says to the server to not bother returning the outputs, which can then be retrieved later in the /jobs/{jobID}/results endpoint. The latter can be interpreted as not requesting any output. It does not explicitly indicate "I do not want any" as the outputs: {} does.

Omitting outputs is equivalent to listing all available outputs explicitly, but that must not be interpreted the same way when "all" outputs just so happens to be only N=1. That causes an ambiguity where the server would always respond as N>=2, return=minimal -> JSON results as per Table 13, except for the N=1 edge-case.

@jerstlouis
Copy link
Member

jerstlouis commented May 24, 2024

@francescoingv @fmigneault @pvretano

Single processing output always return "value"

That is by design!

In the original question above, @francescoingv is asking about 7.11.4.2 and 7.11.4.3 which are talking specifically about synchronous execution.

Regarding the original intent for 1.1/2.0 as we initially proposed, the client never gets a results.yaml response from a single output synchronous execution resonse. There is a single output, so the client gets that back directly.

In synchronous execution, the client only gets back a results.yaml-like response when:

  • the client negotiates Accept: application/json,
  • the response generates multiple outputs. Whether the client explicitly requests multiple available outputs, or omits outputs and the process actually declares that it returns multiple outputs makes no difference. If the process only generates one output and the client omit outputs, or explicitly ask for the one output, is totally equivalent. The client only gets one output back, which is the only condition determining the behavior: how many outputs is the client retrieving from this process for this request?

See the original summary proposal for 1.1/2.0:

#217 (comment)

For a synchronous execution returning multiple outputs, the same rules [for the POST execution response] as for [the GET] /jobs/{jobId}/results apply
For a synchronous execution returning a single output, the same rules [for the POST execution response] as for [the GET] /jobs/{jobId}/results/{resultId} apply

I believe the draft is still following that roadmap, so:

I think the returned output should conform to the results.yaml schema.

No, because sync execution with single output never returns a results.yaml response.

Should the return preference considered,

The preference never changes whether results.yaml is used or not.
Instead, it changes whether the values of the different outputs inside results.yaml are inline JSON objects, or "href" references.
It does not apply in this case, because a sync execution single output returns the one output directly, always.

and could the server decide whether to return "value" or "reference" also for single processing output
as spelled for 7.11.4.3. Response requesting multiple processing outputs?

Only applies for multiple outputs, when negotiating application/json and the response will actually follow results.yaml.

To use the results.yaml on sync execution each time, I believe the solution would be to not request any outputs (i.e.: omit outputs in the execution body, as per Requirement 28 - /req/core/process-execute-default-outputs). That would result in "all" the outputs to be returned as JSON, even if "all" is only 1 output.

No, there is no way that the server returns a results.yaml for a single output in synchronous execution.
If the goal is for having a link for some entity to retrieve the response, that other entity can be the one doing the POST instead and getting the output back directly, rather than doing a 2-step process.

Or the intermediate entity can make the request and store the result somewhere (e.g., object storage) and share that link.

One reason behind this direct response approach is to greatly facilitate implementing synchronous execution without the need for an implementation to implement persistent storage at all (which can be provided by such an other entity instead). The implementation can execute the process, return the response, and forget about it completely as soon as it returned the response. (NOTE: the server doesn't have to follow this execute & forget approach. It can implement and remember /jobs/{jobId}/, but that is completely optional for synchronous execution).

This is possible by either only having single output processes, or by always embedding outputs directly in the case of multi outputs for the results.yaml response (always doing the return=representation behavior).

This allows to easily delegate separately the "execution" vs. "storage of results" to different software components.

In particular, with Processes - Part 3: Workflows "Collection Output" approach, the clients always make small / quick requests for a given Time/Area/Resolution of interest, and there is never a need to store results (though the server can cache results). Except if the process is not localizable, and everything really needs to be processed before anything can be returned at all.

@bpross-52n
Copy link
Contributor

SWG meeting from 2024-05-27: We want to preserve that for sync exectution for a single output value, you will get that value directly.

@fmigneault
Copy link
Contributor

fmigneault commented May 28, 2024

I do not disagree with the design of the response to return a single output if requested.

However, I can see how that can lead to confusion by clients (hence this issue) when no specific output was requested/filtered, given that omitting REQUESTED # OUTPUTS as Table 13 indicates does not address the ambiguity between outputs: {} (a.k.a. the none in the table) and "none" provided altogether by omitting the field.

Given that some processes return 1 output and others N>=2, clients would receive completely different response formats when omitting outputs. The response should instead be consistent with the requested outputs. If the single desired output is known in advance, it would make much more sense to include it explicitly in outputs, as this would yield the same direct-output-value response regardless of N.

@jerstlouis
Copy link
Member

@fmigneault

The response should instead be consistent with the requested outputs

The requested outputs when omitting outputs is all outputs (which might well be only one), and the client needs to check the process description to put together the execution request, so it knows whether N=1 or N>=2.

If the single desired output is known in advance, it would make much more sense to include it explicitly in outputs, as this would yield the same direct-output-value response regardless of N.

The client can do that of course, but if it decides to omit outputs, the request is fully equivalent to making the request with outputs including that single output. This is the only thing that makes sense in my opinion, and so the rule about the response needs to follow this equivalency.

@fmigneault
Copy link
Contributor

The requested outputs when omitting outputs is all outputs

Agreed, but having the response change content-type and structure entirely to represent "all outputs" whether that value is N=1 or not is not logical. "All" works as a "package", regardless of the value of N, just like a ZIP containing 1 or more files is still a ZIP. Having to handle when it is sometime a ZIP to unpack or directly the file because of N=1 or not adds complexity when handling the response. This also complexities client implementations that have to deal with this varying result.

On the other hand, if the outputs is requested explicitly, there is no complexity, as you "get what you asked for".

the client needs to check the process description to put together the execution request

The client could very well do the same to specify the single output to obtain directly if that is the intention. At least, the result would be consistent each time, and it would support all possible combinations of responses regardless of the N input amount, including the case of results.yml schema response for N=1 that is not possible currently, but that could be a legitimate request.

@jerstlouis
Copy link
Member

"All" works as a "package",

You could potentially interpret it this way but that's not the way it must be interperted right now for 1.0, and changing the interpretation will break things.

What makes it a package right now is whether there are more than 2 outputs requested, regardless of whether you omitted outputs or explicitly specified two outputs.

When you specify a single output explicitly, you still say "outputs" : { "output1" : { } } so the JSON structure of that is identical to when you request two outputs. So nothing really hints at specifying a single output this way is getting "only the output" vs. secifying two or omitting is a package.

@fmigneault
Copy link
Contributor

If you define "outputs" : { "output1" : { } } and the process has only output1, the result is the value, which is fine because that is what you asked. If you wanted all outputs, regardless of the amount, you would not bother listing them all unnecessarily.

If another process using a similar definition just so happened to have a second output, "outputs" : { "output1" : { } } would yield an entirely different result than the first use, with the need for extra operations to parse the obtained JSON to extract the value of the requested output.

The fact that parsing the response must behave differently depending on the number of outputs the process supports when only 1 is requested in each case is a bad design. From a client's point of view, this forces unnecessary if/else conditions and distinct code path to handle each case, when everything could be handled with the same logic if the specific output needed was requested explicitly each time, or that omitting an outputs filter returned all of them in a similar results.yml structure each time regardless of the number of keys.

Another edge case is if a step process in a workflow gets updated later on from one output to two. Using the explicit output to get the value, the workflow would not need to update at all and the operation would continue to work transparently. With 1.0, the workflow would break and would require an update to specify the output, something that could have been avoided from the start when building the workflow, since you had to pick your output anyway when designing the workflow chain.

What makes it a package right now is whether there are more than 2 outputs requested, regardless of whether you omitted outputs or explicitly specified two outputs.

This is the whole point about this issue. When there are N>=2 outputs, it is possible to obtain both the single value and the result.yml representation, but N=1 forces the value without any way to obtain the results JSON. The behavior is problematic because it limits this capability for no reason.

I do not think leaving this behavior as is because that how it was done in 1.0 is enough of a justification.

@jerstlouis
Copy link
Member

jerstlouis commented May 28, 2024

@fmigneault

If another process using a similar definition just so happened to have a second output, "outputs" : { "output1" : { } } would yield an entirely different result than the first use

I'm confused. In this case, the client is still asking for a single output, the response is still a single output, so the response works the same way as in the first case? Did you mean to talk about if the client omits the "outputs" altogether?

If the client always wants a single output, then it should use "outputs" : { "output1" : { } } exactly as in your example, which in my opinion really is the main use case for synchronous execution, and the client never has to deal with a either a results.yaml response (Accept: application/json) or a multipart or zip response (other content negotiating options for multiple outputs).

if a step process in a workflow gets updated later on from one output to two.

That's a valid concern, but it is also easily avoided by the client always explicitly specifying the single output.

I think it is also not specific to outputs, but also to the required inputs, and could generally be addressed by the concept of WellKnownProcesses, where process descriptions used in workflows step should not change in a breaking way after some well known process has been standardized.

When there are N>=2 outputs, it is possible to obtain both the single value and the result.yml representation, but N=1 forces the value without any way to obtain the results JSON.

Note that the results.yaml JSON should ideally not be mandatory (see #415, at least when large binary outputs are produced that need to be either linked to or base64-encoded), so that the server has the option to return a 406 Not Acceptable, and only supporting e.g. returning a zip file instead for multiple outputs.

The behavior is problematic because it limits this capability for no reason

There is a reason: not forcing sync-only server implementations to implement persistent storage of results, sticking to execute, return, forget.

This allows for example, for an OGC API - Processes async implementation supporting job queuing/monitoring/persistent storage being implemented on top of a simpler OGC API - Processes sync-only implementation (where both implementation may be internal and the OGC API - Processes sync execution is the internal IPC mechanism between the two software components).

@francescoingv
Copy link
Author

I would like to consider the following use-case:
a process produce a file which is saved somewhere and can be accessed/retrieved using an URL the API can return.
No matter if the process is executed sync or async.
With the standard 1.0 I understand the process description could have: "outputTransmission": [ "reference"]
i.e. the "value" is not offered to be returned, and the "default" would return an error code.

Questions:

  • the process description "outputTransmission": [ "reference"] is valid?
  • will this use-case still possible with the next version of the standard?

Possible reasons for this use-case:

  • the link to the file is then used to update a link in a web page, but the caller to the process is not directly interested to the content;
  • the resulting file is used by other processes, and passing the file by value could be expensive in case the file is large.

If the use-case for the process included the caller wants the file content, the process description should have: "outputTransmission": [ "value", "reference"]
Since the previous use-case still exists, some clients would "prefer" to be sure to receive the "reference" (or an error): one reason is to minimize the logic.

While a possible solution could be the process having defined two outputs: "file_content", "file_URL"
I wonder if this would be recommended.

@fmigneault
Copy link
Contributor

@jerstlouis

I'm confused. In this case, the client is still asking for a single output, the response is still a single output, so the response works the same way as in the first case? Did you mean to talk about if the client omits the "outputs" altogether?

Even if there is a single output, the client could prefer receiving it within the same JSON results.yml structure as when N>=2. However, because the requirement about omitting outputs says it is identical to list them all explicitly, there is no way to achieve this. The output value would only be received directly rather than embedded in the JSON. It is also insufficient to use Accept: application/json in this case to try negotiating the response, since the value returned directly could also itself be JSON. There must be another parameter to distinguish these combinations (basically, what was done using response: raw|document before).

That's a valid concern, but it is also easily avoided by the client always explicitly specifying the single output.

Exactly. This is why omitting the outputs should not have the same meaning as asking for a specific one explicitly, since there is a guaranteed and unambiguous method available to obtain the single value, you request it in the body.

I think it is also not specific to outputs, but also to the required inputs, and could generally be addressed by the concept of WellKnownProcesses, where process descriptions used in workflows step should not change in a breaking way after some well known process has been standardized.

This does not consider Part 2 which allows replacing/updating a process in place. Even if the process endpoint was well known and defined, it could change later on. Submitting the same workflow payload for execution after that update would break suddenly. If the chain can resolve transparently by setting outputs value to return directly, I believe it would be beneficial to always include it, and also make the workflow definition clearer.

There is a reason: not forcing sync-only server implementations to implement persistent storage of results, sticking to execute, return, forget.

I don't see how storage is involved here... or any issue about execute/return/forget.
What I want is only this:

  • given N=1 available outputs:
    • requested outputs omitted -> returns { "output1": "the value" }
    • requested outputs: {"output1": {}} -> returns "the value"
  • given N>1 available outputs:
    • requested outputs omitted -> returns { "output1": "the value", "output2": "the value" }
    • requested outputs: {"output1": {}} -> returns "the value"

The response structure is consistent regarding the specification of outputs in the request. It does not depend on the number of available outputs from the process.

@francescoingv

The outputTransmission was removed. There is no way to obtain the flexibility for individual outputs to be by value and others by reference. This version of the specification lost a lot of flexibility when shifting to the Prefer header, which is a shame. You are stuck with letting the server decide for you what is considered "large or not" to switch between value/links results.
There is an increasing disregard of alternative use cases other than sync-only/single-output.

@jerstlouis
Copy link
Member

jerstlouis commented May 28, 2024

@fmigneault

Even if the process endpoint was well known and defined, it could change later on.

That is not what I meant by a "WellKnownProcess". See opengeospatial/ogcapi-geodatacubes#5 .
A WellKnownProcess would be defined by a URI, and have a stable definition registered with e.g., the OGC definition server at http://www.opengis.net/def/wkprocesses/{name}.

I don't see how storage is involved here..

As I mentioned in the other issue, the storage is not actually forced because you can always base64-encode the outputs as
values in the results.yaml, but you incur a ~35% overhead doing so which is significant on large binary outputs.

However, because the requirement about omitting outputs says it is identical to list them all explicitly, there is no way to achieve this.

That is correct,

@francescoingv

outputTransmission is completely gone in the next version, see issue #326 .
Instead, this is replaced by the Prefer: response=minimal vs representation. The server can always decide one or the other.

Not forcing implementations to support returning by reference allows always returning the outputs by value to execute, return and forget in simple sync implementations.

While a possible solution could be the process having defined two outputs: "file_content", "file_URL"
I wonder if this would be recommended.

Thanks for detailing the use case.
I do believe that the solution in the next version is indeed what you suggest with two separate outputs, one being the URL.

The Collection / Dataset output defined in Part 3 might also often be applicable to such scenarios. This sets up the execution request, and provides an OGC API endpoint where to retrieve results (without actually triggering any processing yet -- the client actually interested in the data can request an area of interest which can be processed / cached as needed).

You are stuck with letting the server decide for you what is considered "large or not" to switch between value/links results.
There is an increasing disregard of alternative use cases other than sync-only/single-output.

I'm not sure sync-only/single output is related to the value/links result. The reason things are this way is to allows the server as much flexibility as possible in how it decides to return things, because the implementation / deployment method might have some particular limitations, and because the servers knows its processing capabilities and the output better than the client. While it is true that the client has slightly less control over the response (like no more mode: raw | document or tranmissionMode: value, reference), there were so many options as illustrated in the tables, that this was really overcomplicated and very difficult to implement correctly in both client and server in 1.0. See Table 11 for 1.0 vs. Table 13 in 1.1/2.0; and Table 12 for 1.0 vs. Table 14 in 1.1/2.0.

@jerstlouis
Copy link
Member

@fmigneault

One reason in particular that changing the behavior about returning results.yaml or directly the one output is problematic is that the purpose of "outputs" is not only to select outputs, but also to select things like the output format.

So if the change was made to say that omitting outputs altogether always returns a results.yaml, you would still not be able to request this way while selecting a particular output format.

Requesting a single output, with 1.1/2.0 you can of course use content negotiation directly to do this instead of using outputs, like you can now also do from /results/{resultId} as well (though the server may not support negotiating a different type than the one initially requested at execution time).

@fmigneault
Copy link
Contributor

@jerstlouis

My interpretation of version 1.0 is that the outputTransmission by reference was only applicable in synchronous execution for responses that generated more than one output.

This is incorrect.
Table 11 (7.11.4 in https://docs.ogc.org/is/18-062r2/18-062r2.html#toc33) allows sync+raw and 1 output reference, which results in 204 with a Link header.

Not forcing implementations to support returning by reference allows always returning the outputs by value to execute, return and forget in simple sync implementations.

Nothing was forced. Process description were allowed to indicate whether it supported one or both outputTransmission approaches, just like sync-execute/async-execute indicate their execution support. A sync-only server that desires to support only by-value simply needed to define its processes with jobControlOptions: [sync-execute], outputTransmission: [value].

I do believe that the solution in the next version, or even for a single output/sync execution in version 1.0, is indeed what you suggest with two separate outputs, one being the URL.

I disagree. This is not a solution. The point of Prefer: response=minimal|representation if applied by the server is that the same output can be obtained both by link and by value. Servers that respect this should not have to deal with add-on outputs just because sync/value-only server are lacking capabilities.

allows the server as much flexibility as possible in how it decides to return things

But too much flexibility makes servers not able to work together since they cannot even agree on how things are resolved between them and how to agree on specific behaviors.

because the servers knows its processing capabilities and the output better than the client.

Not necessarily. The example given by @francescoingv shows just that. The client might know that the resulting link is used to update a website, and nothing more. The server might have no knowledge at all about this intended use. If the process/server advertise that it can support both modes, it should honor it as best as possible when the client asks for it.

really overcomplicated and very difficult to implement correctly in both client and server in 1.0

I'd argue otherwise as there was no ambiguity or implementation-specific interpretability before. If a server did not want to support one combination, it could simply indicate it as such by omitting the relevant options from its process description. If a server did not reply with the specific combination requested, we could quickly validate which one was misbehaving. Right now, server/client have to do a lot of guess work, and this is very hard to implement when there is no explicit guidance how to handle all possible results. When sending a request, I currently can never be sure I will get what I asked for, so I have to try handling all possible side effects.

One reason in particular that changing the behavior about returning results.yaml or directly the one output is problematic is that the purpose of "outputs" is not only to select outputs, but also to select things like the output format. So if the change was made to say that omitting outputs altogether always returns a results.yaml, you would still not be able to request this way while selecting a particular output format.

That is a good point. Relying on a distinct response parameter made things easier instead of inferring various combinations of formats based on a single outputs property. I would personally rather revert response or find another Prefer parameter for it separately than trying to deal with another alternate outputs definition for that case.

@jerstlouis
Copy link
Member

@fmigneault

This is incorrect.

Right, I realized that after writing that and looking at the tables again, sorry. I had already edited my comment.

I would personally rather revert response or find another Prefer parameter for it separately than trying to deal with another alternate outputs definition for that case.

Hmmm. Could we perhaps use the minimal vs. representation here again to split the first row of Table 13?

So that if you specify Prefer: response=minimal the client hints at the server that it would like a Link: rather than the full response for a single output?

@francescoingv
Copy link
Author

@jerstlouis

A sync-only server that desires to support only by-value simply needed to define its processes with jobControlOptions: [sync-execute], outputTransmission: [value].

My issue is not about jobControlOptions,
it is only about outputTransmission option, and specifically about a process that desire to support only by reference.
Is that case possible?

Also,

The client might know that the resulting link is used to update a website, and nothing more

The client would benefit for receiving either the reference or an error code, as a simpler code to handle the difference is needed.
The server not able to returning the requested reference for a specific request could possibly avoid to even start the elaboration.

While I understand the server returning a reference if a value is requested (e.g. if the returned value is "too big")
I don't see a use case where the client ask for a reference and is prepared to accept a value.

@gfenoy
Copy link
Contributor

gfenoy commented May 29, 2024

My issue is not about jobControlOptions,
it is only about outputTransmission option, and specifically about a process that desire to support only by reference.
Is that case possible?

In the past, the client application could be informed that a given process can only support reference (or value) by accessing the outputTransmission in the process description.

Nevertheless, this commit 90e70b2 removed it from the processSummary.yaml schema.

@francescoingv
Copy link
Author

francescoingv commented May 29, 2024

@gfenoy

In the past, the client application could be informed that a given process can only support reference (or value) by providing the outputTransmission in the process description.

The issue is about the ability of the next version 1.1/2.0 to accommodate the presented use case, where:

  • the server can offer the result as reference,
  • the client is not able to use to the result if provided as value.

If I understand it correctly:

  • the server can not advertise whether it support the result as value, as reference or both;
  • the client can ask for a Preference, but it is not assured about what it will get back;
  • in the presented use case:
    • the server will elaborate the request, possibly returning a value;
    • the client must elaborate the result to understand if the result is a reference or a value, and possibly (if received a value) advise the operation did not completed successfully.

If the above is correct, then the only solution I see for the presented use case is:

  • the process having defined the following outputs:
    • "file_content" (will return the result either as the file content, or as reference where to get the file content)
    • "file_reference" (will return the result either as a string representing an URL where to get the file content, or a reference where to get the URL)
  • the client requesting only the output "file_reference".

In this case, if the server cannot return the requested output (the URL) for the given request, then it will produce an error code.

@fmigneault
Copy link
Contributor

@francescoingv

You correctly understood the issue. The use case you want to achieve is completely supported with 1.0, but is not possible in the current revision. There is no way to "enforce" the value/reference format to return.

The file_content and file_reference does not seem like a "solution" IMO. The response schemas offer flexibility in returning any output either as a value or as a link according to Prefer: response. Given that, having to hack an alternate output for both modes is an anti-pattern/bad practice. Note also that in this situation, if outputs: {"file_reference": {}} was not explicitly requested, the file_content value would also be returned, meaning the response would possibly return an unnecessary large amount of data although only the link was needed.

@jerstlouis

Prefer: response=minimal could make sense, but the server is still allowed to consider the data as "not big enough" to need a link. In such case minimal would still return the raw data rather than the Link.

@huard
Copy link

huard commented May 31, 2024

For the record, our use cases (currently running over WPS) involve:

  • feeding of process outputs as inputs to other processes (usually netCDF files);
  • we usually prefer output links than raw data, because it allows usage of a streaming protocol (e.g. opendap, or now HTTP requests for chunks of Zarr files) to read data, instead of copies (outputs can be fairly large);
  • a "smart" client that builds a mock Python function for each process, giving users the impression that function are executed locally;

Sorry if this is too vague to inform the discussions here.

@jerstlouis
Copy link
Member

jerstlouis commented May 31, 2024

@fmigneault

Prefer: response=minimal could make sense, but the server is still allowed to consider the data as "not big enough" to need a link. In such case minimal would still return the raw data rather than the Link.

That is a necessary compromise to allow for implementations that are not required to implement persistent storage.

Note that in this case (and any case with Prefer: minimal/representation), it is not a matter of "big enough" or not.
The client is expressing with the Prefer: minimal header that it would like EVERYTHING (except strings, boolean and numbers) as links.

With Prefer: representation, the client is expressing that it would like the response to be self-contained.

The big enough discretion of the server is when a preference for minimal/representation is not expressed by the client.

@fmigneault
Copy link
Contributor

@jerstlouis

That is a necessary compromise to allow for implementations that are not required to implement persistent storage.

I have no problem with letting applications implement the simplest approach of only supporting direct return value, and the standard allowing for this flexibility. That will help early adoption with reduced implementation requirements. However, I do not believe fulfilling this justifies any compromise of making it harder for other modes to work efficiently as well. My issue is that, for applications that do offer this support of alternate link/storage mode and raw/document formats, Prefer as it stands and the "server's judgement of large data" does not provide a guarantee that a specific format would be returned to the client.

When working simultaneously with servers that do support storage and others that don't, the request with Prefer can be exactly the same for each case, but the response will still differ case by case because of different support, which makes expected behavior erratic and lowers interoperability. I would much rather have a server explicitly respond a "not supported" code than try to deal with any outcome from an "OK" response not matching the structure I expected.

Older definitions of outputTransmission=value|reference and response=raw|document allowed that, because a server could in-advance indicate what it is willing to support for a given process. A server not implementing storage only add to list outputTransmission: [value], nothing more. The client could properly negotiate supported modes based on what each process description indicated and handle received responses accordingly. Currently, the client needs to send the request without any guarantee and cross fingers the response will match what is asked for.

The client is expressing with the Prefer: minimal header that it would like EVERYTHING (except strings, boolean and numbers) as links.

This is also a regression IMO. There is no reason to limit this if the client knows that some outputs would be better used as links for chaining elsewhere while others could be obtained directly to avoid a subsequent request to retrieve their stored contents.

The big enough discretion of the server is when a preference for minimal/representation is not expressed by the client.

This is not my understanding of the current specification. While representation forces inline, minimal and omitting any preference both result in the server doing whatever it wants. In other words, use cases such as the ones expressed by @francescoingv, @huard and myself where a link is desired can never be guaranteed since the server can still deem it to be small enough and return it inline.

@bpross-52n
Copy link
Contributor

SWG meeting from 2024-06-10: We will discuss this further at the TC session on June 19th.

fmigneault added a commit to crim-ca/weaver that referenced this issue Sep 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

7 participants