Accept and Serve raw Binary with select MIME types #3715

lmsurpre · 2022-06-15T17:02:42Z

Is your feature request related to a problem? Please describe.
From http://hl7.org/fhir/binary.html#rest

The _format overrides the accept header and SHALL be interpreted as using the standard FHIR mime types, even if the more generic mime types are given as a value.
When the read request has some other type in the Accept header, then the content should be returned with the content type stated in the resource in the Content-Type header. E.g. if the content type in the resource is "application/pdf", then the content should be returned as a PDF directly. The _summary parameter does not apply in this case.
due to the way the web infrastructure works, it is not possible to make blanket rules about the relationship between the "Accept" field in the HTTP request, and the return type, which is why there is no hard rule about this. However, the intent is that unless specifically requested, the FHIR XML/JSON representation is not returned
When binary data is written to the server (create/update - POST or PUT), the data is accepted as is and treated as the content of a Binary, including when the content type is "application/fhir+xml" or "application/fhir+json", except for the special case where the content is actually a Binary resource.
Note that when client requests a Binary resource using a generic mime type (application/xml, text/xml, or application/json), the server SHOULD return the content directly if the content-type format matches the requested mime type (e.g. if the Accept header is application/json, and the contentType is vnd.xacml+json). However, servers might not always be able to interpret mime types correctly, and clients SHOULD be prepared to receive either format.

But it goes on to say this:

Binary resources are not constrained to any list of safe content types (content types without active elements such as scripting or executable code), and therefore can be of any content type and encoding. Therefore, extra care needs to be taken to validate the content of the Binary resource against malicious or malformed content. For more details see Security of Narrative, since the security issues are similar.

Describe the solution you'd like
Investigate options for safely accepting, storing, and serving Binary resources without their FHIR wrapper.

One nice suggestion I saw online is to serve the Binary resources from a separate base URL (to prevent against things like XSS). That aligns nicely with my thinking that such "Binary storage" is better suited for a cloud storage API (rather than a FHIR server) but the requirement to be able to wrap that content in a FHIR Binary wrapper (which must be served from the same domain) make that challenging.

Instead, I'm wondering if we can limit it to some configurable list of MIME types with a "safe" default.

Describe alternatives you've considered
Don't support it.

Acceptance Criteria

WHEN binary data is written to the server (create or update) under [base]/Binary
THEN the data is accepted as is and treated as the content of a Binary, including when the content type is "application/fhir+xml" or "application/fhir+json", except for the special case where the content is actually a Binary resource.
GIVEN a read request is made on a Binary resource
AND there is a _format query parameter with one of json, xml, application/json, application/xml, application/fhir+xml, or application/fhir+json
THEN a Binary FHIR resource is returned
GIVEN a read request is made on a Binary resource
AND the Accept header includes application/fhir+xml or application/fhir+json
THEN a Binary FHIR resource is returned
GIVEN a read request is made on a Binary resource
AND there is NO Accept header with values application/fhir+xml or application/fhir+json
AND there is NO _format query parameter with values json, xml, application/json, application/xml, application/fhir+xml, or application/fhir+json
THEN the Binary resource is served with its configured MIME type

Additional context
from https://chat.fhir.org/#narrow/stream/179166-implementers/topic/write.20payload.20sizes
Paul Church: I would handle large blobs only through the Binary resource, and have DocumentReference point to Binary whenever the content is large. For the Google implementation the max resource size is normally 10MB but once we add the Binary-specific API endpoint that will be able to take gigabytes - it doesn't have to do parsing, just stores binary blobs. This seems like the spec-intended pattern, although I'm not sure how widely implemented it is.

lmsurpre · 2022-10-04T13:14:20Z

We feel that it makes more sense to leverage modern cloud infrastructure like S3 or Azure Blob for these large binary payloads.
The FHIR "Attachment" datatype (used in places like DocumentReference) actually supports linking to the data via a url (vs using a Reference to a Binary resource)...so it should be possible to store this data outside of the FHIR Server already.

lmsurpre added the enhancement New feature or request label Jun 15, 2022

lmsurpre mentioned this issue Jun 15, 2022

Warn fhir-smart users that Binary.securityContext will not affect access control #3716

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Accept and Serve raw Binary with select MIME types #3715

Accept and Serve raw Binary with select MIME types #3715

lmsurpre commented Jun 15, 2022 •

edited

Loading

lmsurpre commented Oct 4, 2022

Accept and Serve raw Binary with select MIME types #3715

Accept and Serve raw Binary with select MIME types #3715

Comments

lmsurpre commented Jun 15, 2022 • edited Loading

lmsurpre commented Oct 4, 2022

lmsurpre commented Jun 15, 2022 •

edited

Loading