-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data license declaration capability #95
Comments
This seems like the kind of issue others have encountered, and stationxml should follow any existing solutions. Is there a compelling use case for a new element vs embedding the license in an XML comment, like:
or even just:
In other words, if there is a need for machine parsing of the license, then the syntax should probably be locked down even more than the given example. For example requiring the abbreviation to be from https://spdx.org/licenses/ as opposed to each user making up their own abbreviations. If machine parsing is not needed then the comment idea would work today with no schema changes, which is of course an advantage. Is there a compelling use case for a channel having a different license from its enclosing station and/or network? I can see a different network perhaps requiring a different license, but we should avoid "xml bloat" where every channel for every station in a network repeats the same license. Even just repeating the abbreviation adds a lot of noise to the xml. If we incorporating a license is needed, it might be good to consider whether also incorporating a copyright is wise. Copyright and license are orthogonal concepts, but sometimes knowing one without the other is a problem. Lastly, of course, this sort of thing goes beyond seismologist and software developers and starts to get lawyers involved, so tread carefully. |
I believe machine parsable is the primary goal of this format and design should be targeting that use. Also, details should be describable in the schema, and I don't think that's possible for comments. For these reasons the XML comment option is less desirable in my opinion. I completely agree that if we can find a list of abbreviations and/or other definitions and examples to draw from we should.
I do not believe licensing data is controversial. In my non-legal option, at this point we risk doing more harm than good by not having the ability to declare a license in standardized metadata. An issue that should also be considered is whether the declared license covers the metadata in addition to the data is describes. Traditionally we have treated metadata as "public domain" in the sense that it can be freely used with no restrictions (or requirements of citation). We should be clear on the scope of any declaration. |
I missed this, so worth documenting carefully. I thought you were talking about the stationxml, not the miniseed it relates to. "Data" is a pretty generic term, perhaps there is way to make it clearer. The recommendation makes more sense now, and I agree a comment likely is not the right answer. Regardless, if we are licensing the waveforms, we might need to license the stationxml metadata too. Just because it is "meta" doesn't mean it isn't someone's property.
What I mean by this is the details may matter a lot. For example, if there are 2 data license elements, does that mean both have to be satisfied, or the user can pick between the two (and vs or). Maybe better to have only one to avoid this ambiguity. Other seeming small details can have outsized effects. This may be useful: |
The licence on the stationXML document should be explicit. An attribute in markup should fit as we don't need to describe complex licencing (99% cases will be CC-0 ?). The licences markup on the waveform data should have starttime/endtime attributes. There will be duplication between DOI's metadata and stationXML metadata on this matter. So maybe we should write in the documentation who is right (DOI or stationXML ?). |
@jschaeff do you have a link for how DataCite does this that would be helpful? If there was a starttime/endtime on a license, that could get complicated quickly. For example, you could think of the standard PASSCAL data policy as a license. It is proprietary for 2 years after collection, but CC-0 after 2 years, so the license changes with time. Guess I am wondering if the right answer is to provide a way to link to the actual license policy instead of trying to embed it directly, so keep only the url and not have the abbreviation or any text? Then complex cases can be handled by the license holder instead of by stationxml? That would mean that responsibility for dealing with any conflicts or confusion is totally on the license holder, all we provide is a place to put the URL. Common license types, like CC-0 would all use the creative commons url, so it is easy to tell. Would two elements like Although not a stationxml issue, the marking of the license really needs to also be on the actual data itself. Not sure if there is a way to standardize how to do this in miniseed2? |
It appears that DataCite is not very complete for complex license management, see pages 27 and 28 of https://schema.datacite.org/meta/kernel-4.4/doc/DataCite-MetadataKernel_v4.4.pdf But, RDA came out with Machine Actionable DMP standard that allows to define fixed embargoes. Bur it does not allow to define rolling embargoes. With a simple URL to the license you would miss the machine readable part. But in the end, I guess, what a machine must know is if the waveform data is open or restricted now. So maybe the current license is enough, as suggested by Chad.
|
DataCite information is interesting, propose that we reuse as much of what they have created. They use
Alternative would be to use
Possible to add date ranges, or perhaps things like |
Currently there is no clear place to include a data license declaration in StationXML and doing so is becoming increasingly important.
One option is to add this by allowing a
DataLicense
element in theBaseNode
definition, which would allow declaration at theNetwork
,Station
, andChannel
levels.The element would be optional and could occur any number of times. An
abbreviation
attribute allows declaration of the common label often used, e.g.CC0
,CC-BY
, etc. AURL
attribute allows identification of license text.This is analogous to the
Identifier
element added in 1.1 revision.For example:
In the schema:
and
The text was updated successfully, but these errors were encountered: