-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
option to output ORE resource maps in JSON-LD ? #84
Comments
@cboettig Yeah, so the engineering doc that the vignette points to maybe isn't the best
The DataONE package model builds on the ORE OAI package model, and the serialization supported by DataONE is RDF/XML. Relationships from the ProvOne data modelare delivered to DataONE via this serialization(resource map). Including this relationships is consistent with the ORE model as described [here(http://www.openarchives.org/ore/1.0/datamodel#GlobalRels.) So, all this needs to be explained clearly in the vignette for the typical user. |
@gothub that's great, thanks. I guess what's not clear still is the extent to which the package would be useful to someone interested in creating ORE-OAI or Prov serializations outside of the DataONE context. Which I guess is why I bring up this question in reference to the title issue, e.g. I think it would be good for any ORE-OAI toolkit to support that JSON-LD serialization they describe https://www.openarchives.org/ore/0.9/jsonld , but if the goal isn't so general and DataONE isn't consuming that format than obviously it would be out of scope. |
@cboettig It's certainly possible to create a JSON-LD serialization from a DataPackage, but how do we determine if that is useful? The |
@gothub I was thinking these descriptions might have some value in-and-of themselves, or for generic delivery mechanism (e.g. particularly with the provenance annotation of the files), but you're probably right that use case is pretty limited. So such a stand-alone serialization probably doesn't make sense. Will close, at least I understand the picture a bit better though! |
@cboettig and @gothub We never envisioned this as exclusively the DataONE packaging model, but rather as a generic data packaging model that could support multiple implementations. Our plan was to first support the ORE serialization which was widely implemented for over a decade and semantically rich, and then later add others like OKFN style data packages as described in issue #40. This package got started at OS Codefest and both @sckott and others were involved in the discussions to create a data package model that could both act as an intermediary between client tools and repositories, but also could be used within R itself as a first class data loading mechanism, given the shortcomings of R's current data handling (lack of support for metadata, multiple formats, etc). So, there's a lot more that could be done. So, I see great value in a JSON-LD export from these packages, although currently the OKFN data-package spec is not rich enough to support the relationships that are currently expressed in the DataONE model, so it would be a bit lossy -- mainly for provenance info. But it would still be useful. |
@mbjones Thanks for clarifying, that background is very helpful, and renews my conviction that this abilities would be valuable generally. Re a JSON-LD export, right, I wouldn't involve the OKFN spec; note that my link goes to ORE standard's own page on how to serialize ORE in JSON-LD, which I assume should make it straight-forward and lossless(?) I think this would make the metadata both easier to visualize than RDF and perhaps more appealing to other developers who might consume this data and/or combine / extend it with other formats (including PROV as you already do). I do still think the |
@cboettig Yes, well, we have been calling these collections of data, EML, and a manifest data packages since around 2000 to distinguish them from the much more ambiguous concept of a data set. Its even baked into EML 2 itself as the Right now, our major issue isn't with the use of data package per se, but rather that the qualifier 'data' is often too limiting for what goes into these. They really are packages of research products, and can include data, code, metadata, graphics, text, multimedia, and other products of the research cycle. Other terms for these packages have been used by various members of the field, starting with Carl Lagoze's seminal work on Active Digital Objects in the 1990s, and then the work in the UK on Research Objects around 2010 (see ref below), and more recent work by Victoria Stodden et al. on the concept of Research Compendia as envisioned by Gentleman (see ref below). See also the entire Open Archives Initiative (OAI) Reference Model which is built around package concepts such as an Archival Information Package that is used extensively in libraries and national repositories such as NCEI. NIST is even looking at utilizing ORE and the DataONE model of incorporating ORE in BagIt to handle their next generation packaging recommendations. There is a huge literature on this stuff, and an equally large number of overlapping names and concepts for data packages. The whole field is richly interwoven, with equal contributions from the fields of data, workflow, and provenance, each with their own overlapping naming preferences and history. I'm not sure where that leaves us on naming. We've considered research package as an alternative, but have stuck with data package for historical continuity. Lagoze, C., Lynch, C. A., & Daniel, R. (1996). "The Warwick Framework: a container Bechhofer, S.; Bechhofer, S.; De Roure, D.; Gamble, M.; Goble, C.; Buchan, I. (2010). "Research Objects: Towards Exchange and Reuse of Digital Knowledge". Nature Precedings. doi:10.1038/npre.2010.4626.1 Gentleman, Robert, and Duncan Temple Lang. 2007. “Statistical Analyses and Reproducible Research.” Journal of Computational and Graphical Statistics 16 (1): 1–23. doi:10.1198/106186007X178663. http://www.tandfonline.com/doi/abs/10.1198/106186007X178663. P.S. Our R package was originally named |
As outlined here https://www.openarchives.org/ore/0.9/jsonld ?
Apologies if this doesn't make sense or is out of scope, haven't really wrapped my head around DataONE Data Packaging. Is everything currently always an XML-RDF serialization here?
(The package name, README & vignette can make it a bit ambiguous exactly what standard the 'datapack' refers to, which unfortunately sounds similar to OKFN's json-schema for a "data package": https://specs.frictionlessdata.io/data-package/. It could be made more obvious that it is the "DataONE package model that is being implemented, which I gather builds on ORE and possibly PROV, but exactly how / to what extent isn't clear to me.)
The text was updated successfully, but these errors were encountered: