-
Notifications
You must be signed in to change notification settings - Fork 493
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
2734 preserve orig filename #6774
Conversation
if (StringUtil.nonEmpty(extensionToRemove)) { | ||
String newFileName = filename.replace(extensionToRemove, originalExtension); | ||
return newFileName; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sekmiller Am I reading this correctly - that if the current filename has no extension ("foo"), this code will return "foo"? - shouldn't it still return "foo.<ORIG_EXT>"?
(although this shouldn't really happen - since ingested files should always have the ".tab" extension)
(also, it looks like we never checked for this "no extension" case before. To be really OCD though, if you call .replace(".tab", ".dta") on something exotic like "datafile.tabular.tab" will return "datafile.dtaular.dta". See my really old code in StoredOriginalFile.java - I use the regex version of replace there: .replaceAll(".tab$", ".dta"), to make sure I'm only replacing the end part.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good points. I will take another pass at this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@landreev I put in your suggestion on using replaceAll and relying on the fact that a tabular file should always end in ".tab ". It occurs to me that any editing of the file label to anything other than foo.tab will break this (including your "no extension" case). How worried should we be about this, and any suggestions to code against it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Answered my own question while out for a run: so the replaceAll("tab$" which I put in should fix 100% of the cases. Since we know it won't I can modify the previous "look for an extension to replace" code by adding a $ to the replace string so that inadvertent internal instances of the extension are not replaced, and if there's no extension found just append the extension derived from the file type. All of this points out that even after all this we cannot be certain that we're returning anything like the actual original file name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In addition to JsonPrinter, I believe we also want to use this new method in UningestFileCommand.java and in StoredOriginalFile.java (where the filename is set when the saved original is downloaded via the access API; with inputStreamIO.setFileName(...))
simplifies the logic, removes unused code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made another quick change, to further simplify StoredOriginalFile. Looks great otherwise.
Was just talking about this with @donsizemore , we're excited to see this as part of Dataverse 5! |
What this PR does / why we need it:
This PR provides a method for storing the original file name for ingested tabular files. It also adds the original file name to the json printer for data files. For files that were uploaded previously the son printer will infer the original file name from the file metadata and a conversion of the extension based on the original file format field on the data table.
Which issue(s) this PR closes:
Closes #2734
Special notes for your reviewer:
None
Suggestions on how to test this:
Verify that ingested tabular files json export includes the original file name even if the file was ingested previously and thus does not have the original file name saved to the dataTable table.
Does this PR introduce a user interface change?:
No
Is there a release notes update needed for this change?:
No
Additional documentation:
There is a sql script that adds the new field to the dataTable table.