-
Notifications
You must be signed in to change notification settings - Fork 498
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Revisit/reimplement the concept of a "Harvested file". #8629
Comments
This is probably doable in one sprint-worth of time... But let's decide if we actually want to do this ("revisit" being the key word). And/or if maybe we want to address other, more urgently needed harvesting issues? |
This will be a spike:
|
Sizing:
|
@cmbz This was discussed during a tech hour. And we concluded that it wasn't worth it, to try and heavily re-design the current setup, such as, introduce a new database object dedicated to representing a harvested file, etc. But we decided to do one small/simple thing: move the column So we can do one of the 2 things: close this issue as a completed spike, and open a quick dev. issue for implementing the change above. Or change the title of this issue and use it for scheduling and implementing it. The former is probably cleaner (?). |
@landreev I like your first suggestion: "close this issue as a completed spike, and open a quick dev. issue for implementing the change above". Thank you! :) |
Short version: "Harvested files" are currently stored as DvObject/DataFile/FileMetadata/etc. entities, just like "real" files. I don't think they should be handled so.
(I feel like I have a memory of opening an issue for this, but looks like I never did - ?)
History: "Harvested Files" are created locally when a Harvesting client imports DDI or native JSON dataset metadata records with file entries from other Dataverses (DC format does not have a mechanism for encoding files or any kinds of child objects). The reason they become DataFiles/DvObjects is a throwback to or legacy of the old implementation in DVN v2-3. Back then they were treated as actual files - users could download them locally; they stored the remote location (url) in place of the physical file name, and DVN would make an HTTP call to get and proxy the content, transparently to the user. We abandoned that scheme as overly complicated (the problem with authentication was never fully resolved, among other things). So in the current scheme these "files" are used only for indexing. We still attempt to store a link to the remote object (as the
storageidentifier
of the DvObject), but it is never used practically. When search hits for harvested files are displayed, no attempt is made to redirect the user specifically to that file - clicking on the card always sends them to the remote location of the dataset to which the file belongs. This really doesn't justify maintaining the same DvObject hierarchy of entities as for "real" files, IMO.The concept of a "remote file", something that transparently appears as a DataFile to the local user, with the byte content stored elsewhere/remotely, is now being revisited (#7324). Once we have that, we may consider, as an optional/configurable harvesting feature, being able to turn harvested files into these "remotely stored" files locally. But when harvesting file records solely for indexing, I believe we should instead introduce some "HarvestedFileMetadata" entity for storing them.
Definition of done:
The text was updated successfully, but these errors were encountered: