Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Advocacy tools to justify opening up private data sets #41

Closed
rfinean opened this issue Mar 4, 2017 · 15 comments
Closed

Advocacy tools to justify opening up private data sets #41

rfinean opened this issue Mar 4, 2017 · 15 comments

Comments

@rfinean
Copy link

rfinean commented Mar 4, 2017

What are the arguments successfully used by commercial companies to their investors to justify "giving away" their data to the community for free?

@C21Beancounter
Copy link

Take a look at: http://theodi.org/the-value-of-open-data

@npscience
Copy link
Contributor

The wealth of use cases for open data are shown here: http://frictionlessdata.io/user-stories/ (and not exhaustive)

@npscience
Copy link
Contributor

How is open data being used in the EU: https://www.europeandataportal.eu/en/highlights/how-open-data-being-re-used-europe

@rfinean
Copy link
Author

rfinean commented Mar 4, 2017

Like a VC fund spreading their investment amongst hundreds of small companies (with the expectation that most will fail, a few will muddle along but a couple will succeed spectacularly), 'giving away' our data on an 'attribution-only' basis allows thousands of researchers all over the world to use our data to do hundreds of different things, all findable by us because they cite their use of our data. Some of this may well closely match our development goals and is likely to achieve results far quicker than finding, hiring and managing such talent internally.

@rfinean
Copy link
Author

rfinean commented Mar 5, 2017

After a lot of searching for my specific use-case of 'human physiological observations data' I had to narrow it down to 'in critical care' before finding any data at all. Disappointingly, re3data.org didn't have any vitals measurement data sets. Finally I found http://mimic.physionet.org/ which includes data like that I'd like to publish. I've actually come across MIMIC before in 2012 (see also this API) and am surprised that it is still the only repository of hospital observations data that I can find. MIMIC aspired to migrate to the Observational Medical Outcomes Partnership Common Data Model, which is a 2014 standard for this kind of data (SQL schemas all on Github).

MIMIC's approach to attribution is to ask researchers using their data to cite a key 2016 article in Nature that they wrote announcing the database in all papers published that use the database. That allows us to see the work of those who used the data in their research.

@Daniel-Mietchen
Copy link
Collaborator

In case it's useful for #33 or #51, the Wikidata ID of that paper is Q28871995.

@rfinean
Copy link
Author

rfinean commented Mar 5, 2017

In case it's useful for your tests here is the article's iPython source: https://github.com/MIT-LCP/mimic-iii-paper/

@Daniel-Mietchen
Copy link
Collaborator

@rfinean Thanks for the pointer - that is actually on our list (line 78), and I'll dive right into it.

@rfinean
Copy link
Author

rfinean commented Mar 5, 2017

If we look into older articles about MIMIC-II (from 2011) we can see thousands more citations

@Daniel-Mietchen
Copy link
Collaborator

@rfinean @tompollard None of the three notebooks in https://github.com/MIT-LCP/mimic-iii-paper/tree/master/notebooks ran through without error. I only documented the first error for one of them.

@rfinean
Copy link
Author

rfinean commented Mar 5, 2017

The Open Data Handbook is a good resource for advocating publishing data in a FAIR way

@tompollard
Copy link

Hi @Daniel-Mietchen interesting to see this conversation here! Please could you point me to the issue that you had running the MIMIC-III notebook? It certainly was working and I'd like to fix it.

I assume the cause is either (1) updates to packages (2) a result of testing the notebook on the current version of MIMIC (v1.4), rather than the previous version which it was written for.

@Daniel-Mietchen
Copy link
Collaborator

@tompollard
Copy link

Thanks @Daniel-Mietchen. The problem seems to be that the test user was trying to run a Python 2 notebook using Python 3.

All gave
"I couldn't find a kernel matching Python 2. Please select a kernel:"

I guess we could provide a virtual environment of some sort. Adding a requirements file to help with package install would be useful, so I'll try to get around this.

The user would also need access to a password protected dataset, so it's difficult to avoid a small amount of set up.

@Daniel-Mietchen
Copy link
Collaborator

Hi @tompollard ,
the test user in this case was me, and the exercise here was just to see whether notebooks would run through, documenting the first problem if not.
In a second run, we will go over the corpus again, document all issues that pop up on the way and — to the extent possible — the steps needed to get the notebooks to run.
You are most welcome to join the effort over in issue 25.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants