-
Notifications
You must be signed in to change notification settings - Fork 495
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dataverse should return machine-readable metadata to requesting clients/servers (content negotiation) #3699
Comments
@landreev doesn't our Harvesting (OAI-PMH) implementation already do some content negotiation? The SWORD spec talks about content negotiation but in practice our implementation of SWORD is very simple and for files the only content we accept is the one which is required by SWORD, which is a zip file. We require that SWORD clients uploading files to Dataverse to send this header: |
@jggautier does Harvesting count? |
@landreev very helpfully provided context when I was trying to understand the difference between this and the harvesting Dataverse does now, so he's letting me post his comments on it :)
This type of content negotiation seems a lot more flexible. |
One potential difference between this and harvesting is that there may be an assumption that the content-type negotiation is happening at the dataset landing page, instead of a separate harvesting endpoint. |
@jggautier so what would you consider "definition of done" to be for this issue? I think we could easily argue that Dataverse already meets the recommendation. We could write it up in the User Guide if you want. In addition to Harvesting, we have Export in various formats that are machine readable. The standards-based ones are DDI and Dublin Core: http://guides.dataverse.org/en/4.7/admin/metadataexport.html |
Sorry for this very late reply. Guess I didn't understand enough back then, and still have some questions. A Data Citation Roadmap for Scholarly Data Repositories recommends that "data repositories and identifier service providers such as identifiers.org or DataCite in addition may implement content negotiation for the persistent identifier expressed as HTTP URI, returning machine readable metadata in various formats." The article uses DataCite's implementation as an example:
This already works for Dataverse-based repositories that publish datasets with DataCite DOIs. So systems can use this content negotiation to get metadata about datasets with DataCite DOIs published in Dataverse-based repositories. But what's returned is the metadata that DataCite publishes. This doesn't work for getting the metadata that the Dataverse repository publishes. For example:
Systems could use Dataverse's API or OAI-PMH, but in general the value of the kind of content negotiation that the article recommends is that it's standardized and more stable, right, while systems' APIs might be organized differently from each other and could change over time? And OAI-PMH supports metadata in only XML, while this type of content negotiation allows for metadata in any format, like the JSON in the Schema.org examples above. These are the questions I'd ask to help define the "definition of done" for this issue:
|
I've been emailing the article's corresponding author Tim Clark, who's looking into the questions in the last comment. This has also been discussed in the context of tools for assessing the "FAIR"ness of datasets, as part of the FAIRsFAIR project. |
One can't implement content negotiation for URIs that are not under one's control. So, Dataverse can not implement content negotiation for a DOI HTTP-URI because it doesn't control those DOI URIs. DataCite and CrossRef can (and do) and in doing so allow access to metadata about the metadata they have about a DOI-identified object. Signposting offers (among others) a way to get to metadata about the object that is available at the end of a (Dataverse) repository:
|
Thanks @hvdsomp. You wrote that "One can't implement content negotiation for URIs that are not under one's control." That's an incredibly helpful way to put it. It doesn't seem like this is a recommendation that data repositories can actually implement then, right? We can encourage the people who do control those URIs but haven't implemented content negotiation to implement content negotiation. I'm not sure how Handles work differently than DOIs, but there are at least 7 Dataverse repositories using them, and I'm not sure if content negotiation works for their Handle URIs. Does anyone keeping an eye on this Github issue know more about Handles or know someone who knows more? I'll wait a week before asking in other channels (Dataverse Google Group, Code4Lib mailing list, emailing admins of repositories using Handles). |
I asked in the Dataverse Google Group but haven't had any replies, yet. At @pdurbin's suggestion I also posted questions in the PID Forum, where I also referenced an older post in that forum that makes me question my understanding of this tenth recommendation and of content negotiation in general. |
The technology underlying handles and DOIs is the same, or, to put it differently, DOIs are handles. But organizations like CrossRef and DataCite have implemented a lot of functionality on top of DOIs, including content negotiation with the DOI-HTTP-URI as a means to obtain metadata in various formats, see e.g. https://www.crossref.org/documentation/retrieve-metadata/content-negotiation/. |
To focus on the most important features and bugs, we are closing issues created before 2020 (version 5.0) that are not new feature requests with the label 'Type: Feature'. If you created this issue and you feel the team should revisit this decision, please reopen the issue and leave a comment. |
Upon request from other machine clients and servers (e.g. other archives) accessing datasets through their persistent identifiers, Dataverse should be able to provide dataset metadata in available formats (JSON, DDI, etc.).
This is number 10 of the 11 recommendations made in A Data Citation Roadmap for Scholarly Data Repositories (https://doi.org/10.1101/097196).
The text was updated successfully, but these errors were encountered: