- Neuroimaging Workflows & Statistics for reproducibility by Dorota Jarecka, Satrajit Ghosh, Celia Greenwood and Jean-Baptiste Poline at OHBM (3 hr 45 min)
http://blogs.discovermagazine.com/neuroskeptic/2012/06/14/brains-are-different-on-macs/
Same Data - Different Software - Different Results? Analytic Variability of Group fMRI Results. https://www.pathlms.com/ohbm/courses/8246/sections/12541/video_presentations/116000
There are a few options you can investigate to make your analysis more replicable and reproducible. On top of [sharing your data and your code](#Sharing-your-code, data-and-your-results) you can use containers like docker or singularity that allows you to run your analysis in contained environment that has an operating system, the software you need and all their dependencies.
In practice this means that by using this container:
- other researchers can reproduce your analysis now on their computer (e.g you can run a linux container with freesurfer on your windows computer),
- you can reproduce your own analysis in 5 years from now without facing the problem of knowing which version of the software you used.
If you want a brief conceptual introduction to containers and to the difference between containers and virtual machine, I recommend you start with these 2 posts: https://towardsdatascience.com/learn-enough-docker-to-be-useful-b7ba70caeb4b https://medium.freecodecamp.org/a-beginner-friendly-introduction-to-containers-vms-and-docker-79a9e3e119b
Neurodocker allows you to easily create a docker container suited to your needs in terms of neuroimaging analysis. There is nice tutorial here on how to use it.
Code-ocean is web based service that relies on docker containers to let you run your analysis online. There is post by Stephan Heunis describing how he did that with an SPM pipeline.
Another thing you can implement is using notebooks like jupyter, jupyter lab or binder ( ??? ). Here is fascinating talk by Fernando Perez, one the person behind the jupyter project.
Neuroimaging Workflows & Statistics for reproducibility https://www.pathlms.com/ohbm/courses/8246/sections/12542/video_presentations/115885 Neuroinformatics and Replication: beyond BASH scripts and winner’s curses https://www.pathlms.com/ohbm/courses/8246/sections/12542/video_presentations/116085 Introduction to reproducible neuroimaging https://www.pathlms.com/ohbm/courses/8246/sections/12542/video_presentations/115884 Reproducibility and replicability: a practical approach https://www.pathlms.com/ohbm/courses/8246/sections/12538/video_presentations/116214
The open brain consent form tries to facilitate neuroimaging data sharing by providing an “out of the box” solution addressing human subjects concerns and consisting of
- widely acceptable consent form allowing deposition of anonymized data to public data archives
- collection of tools/pipelines to help anonymization of neuroimaging data making it ready for sharing
LICENSES : to help you which license to choose start here
Lincenses don't apply to data
https://gist.github.com/lukas-h/2a5d00690736b4c3a7ba
In general I suggest you have a look at some of the courses and material offered by the Carpentries for data and code.
http://nikola.me/folder_structure.html https://drivendata.github.io/cookiecutter-data-science/
For managing your code, if you don't already, I suggest you make version control with GIT part of every day your every day workflow. GIT might seem scary and confusing at first but it is well worth the effort: the good news is that there are plenty of tutorials available (for example: here, there or there). Another advantage of using GIT is that it allows you to collaborate on many projects on github but which already makes a lot of sense even simply at the scale of a lab.
Even though GIT is most powerful when using the command line, there are also many graphic interfaces that might just be enough for what you need. Plus the graphic interface can help you get started to then you move on to use the command line only. There is no shame in using a GUI: just don't tell the GIT purists this is what you do otherwise you will never hear the end of it.
https://medium.freecodecamp.org/how-to-use-badges-to-stop-feeling-like-a-noob-d4e6600d37d2
https://lgatto.github.io/github-intro/
Another good coding practice to have is a consistent coding style. For python you have the PEP8 standard and some tools like pylint, pycodestyle, or pep8online that help you make sure that your code complies with this standard.
https://github.com/ambv/black
You can also have a look at the code style used by google for many languages (h/t Kelly Garner). You will notice that matlab is not in the list so you might want to check this here. http://sci-hub.tw/https://www.cambridge.org/core/books/elements-of-matlab-style/8825411CE69013434DB0939780CFD907
mlint and checkcode https://fr.mathworks.com/help/matlab/ref/mlint.html https://fr.mathworks.com/help/matlab/ref/checkcode.html https://blogs.mathworks.com/community/2008/09/08/let-m-lint-help-simplify-your-code/
Having a bug is annoying. Having your code run but give you an obviously wrong answer is more annoying. Having your code run and give you a plausible but wrong answer is scary (and potentially expensive when it crashes a spaceship onto a planet). Having your code run and give you the answer you want but not the true answer is the worst and keeps me up at night.
Selective debugging happens when we don't check the code that gives us the answer we want but we do check it when it gives us an answer that goes against our expectation. In a way it is a quite insidious form of p-hacking.
There are some recent examples in neuroimaging.
Some things that can be done about it:
- organize code reviews in your lab: basically make sure that the code has been checked by another person. Pairing a beginner with a more senior member of the lab can also be a way to improve learning and skill transfer in the lab.
- test your code. These tests can be implemented automatically to your project by continuous integration services like Travis.
- test your pipeline with positive negative control. A negative control is testing your analysis by running on random noise or on data that should have no signal in it. The latter was the approach used by Anders Eklund and Tom Nichols in their cluster failure paper series. A positive control is making sure that your analysis can detect VERY obvious things it should detect (e.g motor cortex activation following button presses, classify responses to auditory versus visual stimuli in V1, …). Jo Etzel has post about this.
https://jupyter4edu.github.io/jupyter-edu-book/
If you are going to do some fMRI analysis you will quickly drown in data if you are not a bit organized, so I highly recommend you use the brain imaging data structure standard (BIDS) to organize your data. The current version of BIDS only talks about raw data but it should soon cover derivatives (e.g preprocessed data). In general BIDS also allows you to more easily share your data and use plenty of analytical tools.
If you would like to use BIDS but you have no idea what a JSON file or the length of the specification document scares you, head over to the BIDS starter kit to find tutorials and scripts to help you rearrange your data.
Datalad is to data what git is to code. It allows curation of data and version controlling of but also lets you crawl databases to explore and download data from them and it facilitates data sharing. Several of these features are described here with scripts that act as tutorial. There are videos presentation of it there.
Having a standard way to organize not only your data but also your code, the results, the documentation... from the beginning of a project can go a long way to save you a lot of time down the line (when communicating within or outside your lab, or when you have to wrap things up when moving to a new project/job). The YODA template is folder structure recommended by ReproNim that you can use.
https://eglerean.wordpress.com/2017/05/24/project-management-data-management/
Other good habits:
- a simple, transparent and systematic filenaming is a good start
- if you have to deal with data in spreadsheet I think you will enjoy this paper and this cookbook
BIDS equivalent for psych data in general https://medium.freecodecamp.org/10-common-data-structures-explained-with-videos-exercises-aaff6c06fb2b
It is often said to:
Always code as if the person who ends up maintaining your code is a violent psychopath who knows where you live.
Proper documentation of a project and good commenting of your code will help others to use it or pick it up later. But there are good selfish reasons to document your project and comment your code: it will most likely help future you when you have to respond to reviewers or when you want to check something in that data set or in that function you used 6 months ago.
- Most likely, you will have to re-run your analysis more than once.
- In the future, you or a collaborator may have to re-visit part of the project.
- Your most likely collaborator is your future self, and your past self doesn’t answer emails.
See here for more.
There are plenty of recommendations out there about writing documentation. I did find this one useful and this list or this checklist that are more specific to README files.
In terms of code I guess the ideal is self-documenting code. Read the docs is a good option that also allows for continuous integration. Python also apparently has this thing called Sphinx that helps create intelligent and beautiful documentation (that alone should make matlab users envious). There are also ways to make it part of a continuous integration.