Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accept and Serve raw Binary with select MIME types #3715

Open
lmsurpre opened this issue Jun 15, 2022 · 1 comment
Open

Accept and Serve raw Binary with select MIME types #3715

lmsurpre opened this issue Jun 15, 2022 · 1 comment
Labels
enhancement New feature or request

Comments

@lmsurpre
Copy link
Member

lmsurpre commented Jun 15, 2022

Is your feature request related to a problem? Please describe.
From http://hl7.org/fhir/binary.html#rest

  • The _format overrides the accept header and SHALL be interpreted as using the standard FHIR mime types, even if the more generic mime types are given as a value.
  • When the read request has some other type in the Accept header, then the content should be returned with the content type stated in the resource in the Content-Type header. E.g. if the content type in the resource is "application/pdf", then the content should be returned as a PDF directly. The _summary parameter does not apply in this case.
  • due to the way the web infrastructure works, it is not possible to make blanket rules about the relationship between the "Accept" field in the HTTP request, and the return type, which is why there is no hard rule about this. However, the intent is that unless specifically requested, the FHIR XML/JSON representation is not returned
  • When binary data is written to the server (create/update - POST or PUT), the data is accepted as is and treated as the content of a Binary, including when the content type is "application/fhir+xml" or "application/fhir+json", except for the special case where the content is actually a Binary resource.
  • Note that when client requests a Binary resource using a generic mime type (application/xml, text/xml, or application/json), the server SHOULD return the content directly if the content-type format matches the requested mime type (e.g. if the Accept header is application/json, and the contentType is vnd.xacml+json). However, servers might not always be able to interpret mime types correctly, and clients SHOULD be prepared to receive either format.

But it goes on to say this:

Binary resources are not constrained to any list of safe content types (content types without active elements such as scripting or executable code), and therefore can be of any content type and encoding. Therefore, extra care needs to be taken to validate the content of the Binary resource against malicious or malformed content. For more details see Security of Narrative, since the security issues are similar.

Describe the solution you'd like
Investigate options for safely accepting, storing, and serving Binary resources without their FHIR wrapper.

One nice suggestion I saw online is to serve the Binary resources from a separate base URL (to prevent against things like XSS). That aligns nicely with my thinking that such "Binary storage" is better suited for a cloud storage API (rather than a FHIR server) but the requirement to be able to wrap that content in a FHIR Binary wrapper (which must be served from the same domain) make that challenging.

Instead, I'm wondering if we can limit it to some configurable list of MIME types with a "safe" default.

Describe alternatives you've considered
Don't support it.

Acceptance Criteria

  1. WHEN binary data is written to the server (create or update) under [base]/Binary
    THEN the data is accepted as is and treated as the content of a Binary, including when the content type is "application/fhir+xml" or "application/fhir+json", except for the special case where the content is actually a Binary resource.

  2. GIVEN a read request is made on a Binary resource
    AND there is a _format query parameter with one of json, xml, application/json, application/xml, application/fhir+xml, or application/fhir+json
    THEN a Binary FHIR resource is returned

  3. GIVEN a read request is made on a Binary resource
    AND the Accept header includes application/fhir+xml or application/fhir+json
    THEN a Binary FHIR resource is returned

  4. GIVEN a read request is made on a Binary resource
    AND there is NO Accept header with values application/fhir+xml or application/fhir+json
    AND there is NO _format query parameter with values json, xml, application/json, application/xml, application/fhir+xml, or application/fhir+json
    THEN the Binary resource is served with its configured MIME type

Additional context
from https://chat.fhir.org/#narrow/stream/179166-implementers/topic/write.20payload.20sizes
Paul Church: I would handle large blobs only through the Binary resource, and have DocumentReference point to Binary whenever the content is large. For the Google implementation the max resource size is normally 10MB but once we add the Binary-specific API endpoint that will be able to take gigabytes - it doesn't have to do parsing, just stores binary blobs. This seems like the spec-intended pattern, although I'm not sure how widely implemented it is.

@lmsurpre
Copy link
Member Author

lmsurpre commented Oct 4, 2022

We feel that it makes more sense to leverage modern cloud infrastructure like S3 or Azure Blob for these large binary payloads.
The FHIR "Attachment" datatype (used in places like DocumentReference) actually supports linking to the data via a url (vs using a Reference to a Binary resource)...so it should be possible to store this data outside of the FHIR Server already.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant