-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Advocacy tools to justify opening up private data sets #41
Comments
Take a look at: http://theodi.org/the-value-of-open-data |
The wealth of use cases for open data are shown here: http://frictionlessdata.io/user-stories/ (and not exhaustive) |
How is open data being used in the EU: https://www.europeandataportal.eu/en/highlights/how-open-data-being-re-used-europe |
Like a VC fund spreading their investment amongst hundreds of small companies (with the expectation that most will fail, a few will muddle along but a couple will succeed spectacularly), 'giving away' our data on an 'attribution-only' basis allows thousands of researchers all over the world to use our data to do hundreds of different things, all findable by us because they cite their use of our data. Some of this may well closely match our development goals and is likely to achieve results far quicker than finding, hiring and managing such talent internally. |
After a lot of searching for my specific use-case of 'human physiological observations data' I had to narrow it down to 'in critical care' before finding any data at all. Disappointingly, re3data.org didn't have any vitals measurement data sets. Finally I found http://mimic.physionet.org/ which includes data like that I'd like to publish. I've actually come across MIMIC before in 2012 (see also this API) and am surprised that it is still the only repository of hospital observations data that I can find. MIMIC aspired to migrate to the Observational Medical Outcomes Partnership Common Data Model, which is a 2014 standard for this kind of data (SQL schemas all on Github). MIMIC's approach to attribution is to ask researchers using their data to cite a key 2016 article in Nature that they wrote announcing the database in all papers published that use the database. That allows us to see the work of those who used the data in their research. |
In case it's useful for your tests here is the article's iPython source: https://github.com/MIT-LCP/mimic-iii-paper/ |
@rfinean Thanks for the pointer - that is actually on our list (line 78), and I'll dive right into it. |
If we look into older articles about MIMIC-II (from 2011) we can see thousands more citations |
@rfinean @tompollard None of the three notebooks in https://github.com/MIT-LCP/mimic-iii-paper/tree/master/notebooks ran through without error. I only documented the first error for one of them. |
The Open Data Handbook is a good resource for advocating publishing data in a FAIR way |
Hi @Daniel-Mietchen interesting to see this conversation here! Please could you point me to the issue that you had running the MIMIC-III notebook? It certainly was working and I'd like to fix it. I assume the cause is either (1) updates to packages (2) a result of testing the notebook on the current version of MIMIC (v1.4), rather than the previous version which it was written for. |
@tompollard It's line 78 in this spreadsheet. For background, see https://markwoodbridge.com/2017/03/05/jupyter-reproducible-science.html . |
Thanks @Daniel-Mietchen. The problem seems to be that the test user was trying to run a Python 2 notebook using Python 3.
I guess we could provide a virtual environment of some sort. Adding a requirements file to help with package install would be useful, so I'll try to get around this. The user would also need access to a password protected dataset, so it's difficult to avoid a small amount of set up. |
Hi @tompollard , |
What are the arguments successfully used by commercial companies to their investors to justify "giving away" their data to the community for free?
The text was updated successfully, but these errors were encountered: