Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FAIR Protocol Buffer? #17

Open
krobasky opened this issue Mar 30, 2018 · 5 comments
Open

FAIR Protocol Buffer? #17

krobasky opened this issue Mar 30, 2018 · 5 comments
Labels

Comments

@krobasky
Copy link

I see this repo is under 'fair-research' - has anybody started on defining a FAIR protocol buffer?

@mikedarcy
Copy link
Collaborator

Apologies, but it is not clear to me what "FAIR protocol buffer" is supposed to mean in the context of the bdbag software. Would it be possible for you to provide some more detail or reference material?

@krobasky
Copy link
Author

krobasky commented Apr 3, 2018

Hi Mike - perhaps my question is misplaced, it relates to the meta-data requirement on the bdbag in order to enable FAIRness; e.g., provenance, unique identifier, keywords, licensing, that sort of thing. Thoughts?

@ianfoster
Copy link
Contributor

The (BD)Bag specification describes a container: it is silent on many of the issues raised in the FAIR principles, like data licenses and vocabularies. However, the metadata directory provides a natural place to address those issues. We can, for example, include Research Object (RO) metadata: see https://github.com/fair-research/bdbag/blob/master/profiles/bdbag-ro-profile.json. (See https://n2t.net/minid:b9dt2t for an example of a BDBag that includes simple RO metadata.)

As Carl Kesselman noted in a recent email exchange, one could address the licensing issue, for example, by:

  1. Adding the actual license text as an asset in the BDBag and have it accessible either in the data directory or via the FETCH.TXT
  2. Using the key/value metadata in the BDBag to associate a license URI or PID with the bag. We could easily extend the profile for BDBag to include this. Extending the key/value metadata is a standard part of the BagIT spec so this is totally acceptable.
  3. Specifying the license as additional research object metadata that you associate with an asset (i.e. file) along with the other file-specific attributes, such as the file type from OBI.

If such conventions are defined, we can integrate them into the BDBag tools.

@krobasky
Copy link
Author

krobasky commented Apr 4, 2018

Myself and a student have been reviewing various community FAIR efforts, mapping these to requirements for a simple metadata model. We considered those ambitious, rigorous efforts such as DATS and HCLS, and decided to start with a more rudimentary, well-scoped set of requirements that are computable, but also decoupled from implementation. For example, we took into account the convention you describe for licensing, and we also take into account versioning for objects, APIs, and even ID's (consider, for example, AAC53040 is the accession ID for the p53 protein sequence object, and the most recent version is AAC53040.1).
What is the best format for sharing these conventions for your consideration and feedback? Would a protocol buffer be a proper format, or a JSON, or...?

@stain
Copy link

stain commented Apr 23, 2018

I agree that more needs to be done to expand the FAIR metadata needed.

Many of those requirements are covered by the underlying specs, for instance Research Object Bundle manifests lists basic provenance per resource. BDBags support RO manifest using the bdbag_ro.py module.

I will admit license was not listed there, we can in theory use the dct:license (from Dublin Core Terms) property in the metadata/manifest.json - that way you can assign license per aggregated file. It is however not directly listed in RO spec so it would be a JSON-LD extension which would need to be added manually by bdbag_ro.py - for instance:

"aggregates": [
  {  "uri": "../data/file.txt",
     "dct:license": {
       "uri": "http://www.apache.org/licenses/LICENSE-2.0",
       "name": "Apache License, Version 2.0" 
     }
  }
]

But this should probably feed upstream to include in a general Research Object profile of FAIR metadata attributes.

There is also schema.org/license as used by for instance BioSchemas Dataset.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants