-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
License Keyword #219
Comments
Hmmm. I'm not sure what the rules are in other countries but in the USA it is not possible to copyright information. Only the expression of that information can copyrighted. |
(Disclaimer: I am not a lawyer. This is no legal advice. Check for your specific use case.) Are you sure? What does information include in that case? Figures (2D pixel meshes), maps, databases, etc. can easily contain created data with enough originality that falls under general IP and copyright, as far as I know. Even if something is not applicable to copyright in a certain country then e.g. open data licenses cover more than this: https://en.wikipedia.org/wiki/Open_data Adding an explicit license can clarify the situation for people that try to use the data, e.g. in automated workflows (e.g. data mining) for meta-studies and ML. Some background:
The only discussion I can find in the US is about databases that are a compilation of other works. |
Quite sure. For example from https://libguides.library.kent.edu/data-management/copyright :
The standard example is the case of a telephone directory. You can copyright the layout of the directory but you cannot copyright the data so someone else is free to publish the same data as long as they use a different layout. For openPMD based files the layout is mandated by the openPMD standard so I do not believe anyone would have any copyright ownership on an openPMD based file. Notice that I am talking about US law only. In the US there are no database rights. In the EU there is. |
Interesting, thanks! Yes, seems very different in other countries. As found by the copyright.gov link above, "enhancing" a database with coprighted material is another thing that could trigger copyright. For example, if I carefully pre-select particles I write in a file, store a post-processed result again in openPMD, etc. this is likely to trigger copyright. The telephone directory did not source that information, we in most cases do. Either way, keeping the default |
After mulling it over a bit, I am against having a data license field in openPMD. If a person wants to keep their data private so be it. But if a openPMD dataset is shared I do not want to have to worry about rights. I do not like the idea that my actions in using an openPMD data set may set me up for being sued. |
A data license, just as open data itself as well as open source does never imply one is forced to publish this data not even any derived works. This is a common misunderstanding of open source and open data and indeed against the freedoms of open-X standards.
In most countries, default copyright will apply as would be with Using well-established licenses just simplifies the situation, nobody has to use them. It just makes clear what the situation is - |
I never said anything about mandatory sharing.
Actually I believe the opposite is true. |
Oh then I misunderstood. I thought you mean this:
Which is totally fine, even with a specific license. (License terms can contain MoU, commercial licenses, etc. as well for some users. If the writer of the file does a valid claim is not our business.) |
The bottom line for me is still that if someone hands me a datafile I do not want to have to check if a certain field in a data file gives me permission to use it. |
Not a lawyer, but if a collaborateur hands you a data file this means you got an implicit, non-exclusive usage right to read it and nothing further, just as one would expect. They could still keep the data file closed otherwise and license it elsewhere. The use case is really the opposite, e.g. people that aggregate/crawl/scan data for meta-studies, training, etc. and cannot do a manual contact-and-inquire workflow for many individual data sets. |
Well if you want the standard could be amended to say that by default any data file that uses the standard has no license restrictions and that if any restrictions are to be placed, the restrictions have to be transmitted externally along with the file. I just do not want to be forced to look in the file itself. |
I see your point. But just as before, we will not force people to make their data essentially public domain. Well, then it's easy: we limit entries to clearly defined FSF and OSI approved licenses and everything else is Data readers can also decide to abort on anything but the former and |
Just going to jump in here to clarify some points. RE: @DavidSagan:
True. However, a file or a specific output is an expression of information. RE:
There seems to be some misunderstanding here. US copyright law is very clear that by default, all rights are reserved on any copyrightable work. Open source licenses are necessary for that reason -- they grant the right to copy copyrighted work, they waive things like implicit warranties, etc. Without a license, someone can sue you for doing anything at all with their data or code, even if they posted it to GitHub or some other public site. It seems like you're arguing that "raw facts" are not copyrightable, and that since the OpenPMD format is open and OpenPMD data is just physics facts, no OpenPMD files will ever be copyrightable. That's dubious at best -- you can find cases like this one with all kinds of arguments over the copyrightability of output. Given that it takes some serious knowhow to set up an OpenPMD run, I think it would be easy for someone to claim that the output of their particular run is copyrightable. It would take a lot more legal precedent than currently exists to prove that OpenPMD data are "pure facts", and that it doesn't take some ingenuity to select an interesting problem from the entire space of OpenPMD inputs. Moreover, I'm pretty sure OpenPMD outputs are not a pure function of the inputs. The machine and environment very likely matter at least somewhat. Anyway, the specific arguments don't really matter. Because the default is that all rights are reserved, the burden is on the user to show that OpenPMD files are "facts" they can "just use". So it's the unlabeled case where the IP rights are murky. See the Open Data Commons FAQ:
So, RE:
You already have to check. A license as proposed simplifies the process. Without it, you have to check with the author by email or something similarly cumbersome. With it, the rights are clearly enumerated, as with open source software licenses, and all you have to do is look at a familiar SPDX descriptor. |
Thank you for the thoughts, feedback and context. I updated the proposed text in the description accordingly. |
OK so here we need to separate US law from, say EU law. For US law indeed facts are not copyrightable. From https://www.copyright.gov/help/faq/faq-protect.html:
The article you site is about copyrightable expression. It is not about facts. The openPMD syntax represents expression of facts and so is copyrightable. However, For EU law there is the concept of Database Rights (https://en.wikipedia.org/wiki/Database_right). This is not a copyright but a separate right. Like the US, copyright cannot be claimed on an openPMD file but a database right can be claimed.
For my way of thinking it is less cumbersome for any database rights notices to be exterior to the data file(s). I don't want my running programs to have to worry about checking for rights. The proper place for notifying people about possible rights problems is well before any programs are run. Also when someone creates a file I don't want my programs to have to worry about asking for what kind of license they want to use. If you are really serious about database rights think about the consequences. At least in accelerator physics data files that are passed around never have rights notices so if an accelerator physicist where forced to check about rights for every data file they get, this would represent a horrible waste of time and effort. Therefore I am strongly opposed having a licensing field. In fact there is a solution where no one has to ever worry about database rights. The solution is to use copyleft and have the openPMD standard mandate that database rights cannot be asserted on openPMD files. Of course this would not prevent someone keeping their data private if they want. |
I fear you still assume that binary files that some of us sometimes call data bases are not subject to copyright. This ain't the case.
That is not correct. US law explicitly talks about "Uncreative collections of facts". That is very different from mere facts. The binary files in question here do not distribute only indices of data (such as library book lists or telephone books) but actual data as well. JPEG's, movie streams, et al. are all copyrightable data. If I convert a movie to another format, be it different encoding or an HDF5 file, this does not remove the "creative spark" of a filmed scene that triggers copyright. Filming a river will not create "uncreative facts" either. The same is true if a scientist comes up with a simulation setup or a measurement setup. The output of such a simulation is more than the cited "Uncreative collections of facts" that are excluded in US right. Again, the point that discriminates is creativity in the copyright sense, which intentionally is an extremely low burden and not bound to the medium that records it nor anything that needs to be considered scientifically creative (novelty, variation, etc.). It's also not our burden to implement a programmatic verification of license meta-data in our I/O routines. A scanner-printer also does not check if I am replicating a copyrighted image. An e-mail program does not check if one sends out a copyrighted If we want to improve the situation there is a simple solution: use licenses and code into programs to select a data license when creating output. For example, explicitly licensing all created data as CC0 license is as close as it gets to "don't worry" (inform your users about this in the software license & input). |
Just for your information, the "mere physical facts" collected at CERN are also licensed appropriately: http://opendata.cern.ch/record/201 They have though out and automated the handling of these unavoidable consequences. |
Add a new data license keyword to openPMD
/
.This can be used to express open access (creative commons) licenses et al. and avoids decoupling this important information from the medium the data is stored in.
Short-hand identifiers are defined in SPDX as well:
https://spdx.org/licenses/
And we should keep a default/free text option for whatever non-free stuff people come up with.
Proposed
/
keyword (required):license
other:
other:unknown
CC-BY-4.0
for the Creative Commons Attribution 4.0 International LicenseCC-BY-SA-4.0
for the Creative Commons Attribution Share Alike 4.0 International LicenseCC0-1.0
for the Creative Commons CC0 waiverother:unknown
this value means no information were provided by the data creator(s) about the restrictions or rights to use this dataThe text was updated successfully, but these errors were encountered: