-
Notifications
You must be signed in to change notification settings - Fork 493
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Display deposited (rather than ingested) copy of tabular files #7956
Comments
There was a similar conversation about the "download all" button in #4000.
This was fixed for "download all" in pull request #4979. |
I would support renaming this to .tsv if it is in fact not distinct (.tab vs. .tsv), but does the Dataverse software extract metadata and structure the .tab differently than a .tsv? Regarding the display of the .tab for end users, it's interesting, some users do get confused and they think their original data is gone or less promoted, and the improvement @pdurbin mentioned above is great! So perhaps it's just a matter of improving the labelling of the .tab file in the file listing? Could it be moved to the bottom of the list? Add a label 'Preservation copy'? |
(the .tsv vs. .tab discussion is in #6006, but just to re-iterate, these are literally the same format. If you want Excel to open a .tab file, you change the extension to .tsv) |
@TaniaSchlatter could the .tab also appear lower in the main file listing window (or have something to differentiate it from other original files?) not just in the download file access window? |
@amberleahey, my quick response is that I see this as an opportunity for an automatic file tag, "Original Format" which users could filter on to change the order in the file table. |
Sounds like this issue can be closed now, doesn't it? The distinction between original format and .tab in the dropdown menu is very handy! |
@BPeuch - no, the main request here has not been implemented, although I think there's agreement for it. While, as @TaniaSchlatter notes, the original format is (now? not sure this is new) listed above the tab/archival format in the dropdown, the default download/display format continues to be .tab. See e.g. on demo.dataverse here: https://demo.dataverse.org/file.xhtml?persistentId=doi:10.70122/FK2/OXQWCP/DU8NM2&version=1.0 |
Oh my bad @adam3smith I did not realize this format was still prominent specifically in file webpages. |
Same on the dataset page -- .tab is still the default display&download format in the list of files as well |
That is true. I thought the problem was only about the 'Download' button but I can see the arguments for highlighting the original format. Users who cannot reuse, say, SPSS files will know to look for alternatives (such as a .tsv output) either way. |
I've been looking into this at QDR and it looks relatively straight forward to change the display in the file table, on the file page, and in the file citations - i.e. giving the name, size, checksum, content type of the original. And the download menu has already been changed to show the original as the first option. The one aspect of functionality I've run into so far where this change is somewhat problematic is in allowing the filename to be edited (for a given dataset version). Currently, what you change is the name of the ingested file (i.e. the *.tab version) and if you change the file extension, you are changing the extension of the tab file. The original file gets <newname>.<original extension>. The mimetype itself is not updated (and can't otherwise be changed) if you do update the extension. I think this could be changed so you edit the name of the original instead - but that would involve dealing with any existing files where the tab version has had the extension changed which would be more work. Alternately, things could be changed so you can't edit extensions. That seems like it could work for QDR, but may not be acceptable generally. So - to move this forward - any thoughts on whether being able to change the extensions on the original and/or tab versions are needed, or how to address legacy data if we just flip to allowing the original file's extension to be changed, etc. are welcome. |
What options do we have for changing the filename and extension via API? If we forbid changing the extension via UI, can it be changed via API? Is there a workaround, I mean. (By the way, it would be kind of cool if you could change the extension and have Dataverse redetect the mimetype.) |
I created #10067 to dissociate the specific aspect of editing extensions apart and make it more granular and have the discussion on this aspect there 😃 |
FWIW: QDR did implement this, making the extension non-editable. |
As download behaviour was already extended, do we also have some updates for display format to show the originally deposited file format (e.g. .csv) instead of .tab? |
@vkush this issue is still open so, no, the .tab is still always shown. Pull requests are welcome! ❤️ |
If the way QDR implemented is acceptable, I can make a PR of that (or if that helps as a starting point). Presumably this could/should have an SPA issue as well? |
@qqmyers please! ❤️ |
This feature request comes out of the discussion on data curation at the 2021 DV community meeting:
Current behavior
When an ingestable tabular file is deposited (.xlsx, .sav, .dta), the default download format (and the displayed file extension) is the ingested .tab version of the file. The original file format is available from the File access menu together with file-level metadata and the explorer tools
Suggested behavior
I suggest that the deposited file format is better suited as the default download format, with .tab (or .tsv as it should be called ;)) being available through the File access menu
Rationale
There are several reasons deposited file formats are preferable:
On a more theoretical level, in the terminology of the OAIS reference mode, we clearly have the SIP (the deposited file) and the AIP (the archived/preservation copy) defined and the question is which of the two is the better DIP. I would argue that is the more commonly useable and often richer data format -- that not just the case for Excel, but also for things like .sav files which include rich metadata that reads nicely not just into SPSS but also into tools like R with appropriate packages.
cc @sbarbosadataverse who was also part of this dicussion
The text was updated successfully, but these errors were encountered: