-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Add support for external files via RFC2017 mime types #1234
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me — I've got just a couple of questions.
lib/active_fedora/file.rb
Outdated
content_will_change! unless @content == string_or_io | ||
@content = string_or_io | ||
end | ||
|
||
def content | ||
local_or_remote_content(true) | ||
external_content? ? external_content : local_or_remote_content(true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to explicitly retrieve the external content URL from the description, and then retrieve it? Can't we just follow Fedora's redirect?
lib/active_fedora/file.rb
Outdated
def external_url | ||
return nil unless mime_type.start_with? "message/external-body" | ||
url = mime_type.split(';').at(2) | ||
url.nil? ? nil : url[/\"(.*?)\"/, 1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a HTTP header parser we can use that would be more robust against slight changes to the external-body format? E.g., I know the spaces are optional, and I think you might be able to use single quotes or change the order of the tokens so the URL was in the second position, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't really find anything to do this parsing for us. Any suggestions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought one of the HTTP clients would have this (and they must internally), but I couldn't find anything. I did find a gem called content-type that will do this: https://rubygems.org/gems/content-type
ContentType.parse("message/external-body; access-type=\"URL\"; URL=\"http://example.org/1\"").parameters['url']
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it worth pulling in this gem (of 2 commits from 3 years ago) along with its dependency parslet for this? Maybe it would be better to come up with something good enough that we own instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I'm overthinking this — we could probably even just use a regex like:
url.match(/url=['"](.+?)['"]/i).captures.first
fcad9a9
to
a8f601a
Compare
end | ||
|
||
def external_url=(external_file_url) | ||
self.mime_type = "message/external-body; access-type=URL; URL=\"#{external_file_url}\"" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cjcolvar This definitely makes it easy. I'm wondering if we could use another property. Could we make content itself an HTML redirect so that when someone hits the resource it actually redirects to the location? Or does that work with using mime_type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fedora currently redirects if the file has a mime_type using message/external-body (RFC2017). This means the content is still stored but is never exposed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cjcolvar interesting. so you could still have binary content on content
but the override to mime_type
would force a redirect?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd have to check to make sure, but I think you could store content but you wouldn't be able to retrieve it. But once you changed the mime type to something else, then you could probably retrieve the content again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is currently how File
and Fedora work. I'm attempting to have .content
return the bits for either external or internal. Thus making the File
act like it either has external or internal content but never both at the same time. I had started with a separate .external_content
method but thought this approach clearer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@escowles Are you suggesting if I want to create an AF::File that points to a non-resolvable or non-HTTP URL then I might be better off storing that in another field (like ebucore:locator
) and setting empty content instead of using message/external-body
in the mime_type
?
I figured it would be better to use the same approach to store the external reference no matter the protocol or resolvability and mark the AF::File
as external content in all of those cases because that is what it is representing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cjcolvar I don't know about this. Right now, Fedora sends redirects, but there is a lot of interest in having Fedora proxy content instead (both HTTP and potentially local files too). My main concern is that if we support anything beyond the currently-implemented behavior (which is already in use by several apps), it will complicate implementing whatever gets decided about proxying or supporting files instead of just HTTP. I'm hoping to discuss this with the new Fedora API spec group that's meeting tomorrow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@escowles Great! Looking forward to the outcome of that then!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@escowles Any updates from the Fedora API spec group meeting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current proposal for the API spec is here: https://github.com/fcrepo/fcrepo-specification/wiki/External-Content (this wiki page has been generally agreed upon and will be incorporated into the spec shortly).
The basic pattern is different from the current implementation:
- Either
url
orlocal-file
are supported, and they are to be interpreted by the server. - The server will then proxy the content and the client doesn't need to know anything about the content being external.
- The location of the external content will be exposed to the client in a
Content-Location
header. - Supporting redirects like the current implementation is still undecided — if anyone has a use case for redirects that proxying won't cover, I'd recommend opening an issue in https://github.com/fcrepo/fcrepo-specification/issues
So, I think we should support the current behavior now, but I don't think we need to do anything about local-file
or other types right now.
@@ -147,6 +161,7 @@ def to_solr(solr_doc = {}, _opts = {}) | |||
end | |||
|
|||
def content=(string_or_io) | |||
raise "This file has external content. First call external_url=nil if you want to use content=." if external_content? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cjcolvar might be nice to have a external_content_reset
method that does the setting to nil for you.
a8f601a
to
966ee74
Compare
@@ -196,7 +216,7 @@ def uploaded_file?(payload) | |||
end | |||
|
|||
def local_or_remote_content(ensure_fetch = true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to settle on the word "external" instead of "remote" for this method (and any others that may be applicable)? We could alias :local_or_remote_content, :local_or_external_content
, and add a deprecation warning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that remote_content here means the version of the content that lives in fedora as opposed to the local yet to be persisted version of the content. External content is different from those two and thus a new name. It might be appropriate to call this method :local_or_remote_or_external_content
now but that seems wordy and confusing.
Closing because I'm no longer working on this and might be irrelevant now. |
This PR takes the pattern used by Fedora and some Hydra implementors (see related links below) and enables it at the ActiveFedora::File level. The use case for Avalon is that we want to store technical metadata and track derivative files that do not live in our repository but somewhere external, possibly a streaming server.
Since the only way to determine if there is external content is to inspect the mime type, a request to fcr:metadata is required to get the file's content. You can see I had to remove the test in spec/integration/attached_files_spec.rb due to this additional request.Certain calls now require requesting fcr:metadata to get the mime type to check if there is external content.I'm guess there are things I've overlooked or edge cases that aren't handled so please give feedback on improvements!
The last failing test is related to #1233 and should disappear if I rebase once that is merged.Related tickets and code:
#1156
samvera/ldp#71
https://github.com/projecthydra/hydra-works/blob/master/lib/hydra/works/services/add_external_file_to_file_set.rb
https://github.com/lbiedinger/ldp/tree/redirects
https://github.com/pulibrary/plum/blob/master/app/jobs/ingest_file_job.rb