BEFORE YOU START

BEFORE YOU START

Reproducibility ( ??? )

Neuroimaging Workflows & Statistics for reproducibility by Dorota Jarecka, Satrajit Ghosh, Celia Greenwood and Jean-Baptiste Poline at OHBM (3 hr 45 min)
http://blogs.discovermagazine.com/neuroskeptic/2012/06/14/brains-are-different-on-macs/

Same Data - Different Software - Different Results? Analytic Variability of Group fMRI Results.

https://www.pathlms.com/ohbm/courses/8246/sections/12541/video_presentations/116000

There are a few options you can investigate to make your analysis more replicable and reproducible. On top of [sharing your data and your code](#Sharing-your-code, data-and-your-results) you can use containers like docker or singularity that allows you to run your analysis in contained environment that has an operating system, the software you need and all their dependencies.

In practice this means that by using this container:

other researchers can reproduce your analysis now on their computer (e.g you can run a linux container with freesurfer on your windows computer),
you can reproduce your own analysis in 5 years from now without facing the problem of knowing which version of the software you used.

If you want a brief conceptual introduction to containers and to the difference between containers and virtual machine, I recommend you start with these 2 posts: https://towardsdatascience.com/learn-enough-docker-to-be-useful-b7ba70caeb4b https://medium.freecodecamp.org/a-beginner-friendly-introduction-to-containers-vms-and-docker-79a9e3e119b

Neurodocker allows you to easily create a docker container suited to your needs in terms of neuroimaging analysis. There is nice tutorial here on how to use it.

Code-ocean is web based service that relies on docker containers to let you run your analysis online. There is post by Stephan Heunis describing how he did that with an SPM pipeline.

Another thing you can implement is using notebooks like jupyter, jupyter lab or binder ( ??? ). Here is fascinating talk by Fernando Perez, one the person behind the jupyter project.

Neuroimaging Workflows & Statistics for reproducibility

https://www.pathlms.com/ohbm/courses/8246/sections/12542/video_presentations/115885 Neuroinformatics and Replication: beyond BASH scripts and winner’s curses
https://www.pathlms.com/ohbm/courses/8246/sections/12542/video_presentations/116085 Introduction to reproducible neuroimaging
https://www.pathlms.com/ohbm/courses/8246/sections/12542/video_presentations/115884 Reproducibility and replicability: a practical approach
https://www.pathlms.com/ohbm/courses/8246/sections/12538/video_presentations/116214

Ethics and consent forms

The open brain consent form tries to facilitate neuroimaging data sharing by providing an “out of the box” solution addressing human subjects concerns and consisting of

widely acceptable consent form allowing deposition of anonymized data to public data archives
collection of tools/pipelines to help anonymization of neuroimaging data making it ready for sharing

LICENSES : to help you which license to choose start here

Lincenses don't apply to data

https://gist.github.com/lukas-h/2a5d00690736b4c3a7ba

Code and data management ( ??? )

In general I suggest you have a look at some of the courses and material offered by the Carpentries for data and code.

Code management

Version control

For managing your code, if you don't already, I suggest you make version control with GIT part of every day your every day workflow. GIT might seem scary and confusing at first but it is well worth the effort: the good news is that there are plenty of tutorials available (for example: here, there or there). Another advantage of using GIT is that it allows you to collaborate on many projects on github but which already makes a lot of sense even simply at the scale of a lab.

Even though GIT is most powerful when using the command line, there are also many graphic interfaces that might just be enough for what you need. Plus the graphic interface can help you get started to then you move on to use the command line only. There is no shame in using a GUI: just don't tell the GIT purists this is what you do otherwise you will never hear the end of it.

https://medium.freecodecamp.org/how-to-use-badges-to-stop-feeling-like-a-noob-d4e6600d37d2

https://lgatto.github.io/github-intro/

Coding style

Another good coding practice to have is a consistent coding style. For python you have the PEP8 standard and some tools like pylint, pycodestyle, or pep8online that help you make sure that your code complies with this standard.

https://github.com/ambv/black

You can also have a look at the code style used by google for many languages (h/t Kelly Garner). You will notice that matlab is not in the list so you might want to check this here. http://sci-hub.tw/https://www.cambridge.org/core/books/elements-of-matlab-style/8825411CE69013434DB0939780CFD907

mlint and checkcode https://fr.mathworks.com/help/matlab/ref/mlint.html https://fr.mathworks.com/help/matlab/ref/checkcode.html https://blogs.mathworks.com/community/2008/09/08/let-m-lint-help-simplify-your-code/

https://arstechnica.com/information-technology/2012/08/ive-inherited-200k-lines-of-spaghetti-codewhat-now/

https://refactoring.com/

Avoid selective debugging: unit tests, positive and negative control

https://www.software.ac.uk/

Having a bug is annoying. Having your code run but give you an obviously wrong answer is more annoying. Having your code run and give you a plausible but wrong answer is scary (and potentially expensive when it crashes a spaceship onto a planet). Having your code run and give you the answer you want but not the true answer is the worst and keeps me up at night.

Selective debugging happens when we don't check the code that gives us the answer we want but we do check it when it gives us an answer that goes against our expectation. In a way it is a quite insidious form of p-hacking.

There are some recent examples in neuroimaging.

Some things that can be done about it:

organize code reviews in your lab: basically make sure that the code has been checked by another person. Pairing a beginner with a more senior member of the lab can also be a way to improve learning and skill transfer in the lab.
test your code. These tests can be implemented automatically to your project by continuous integration services like Travis.
test your pipeline with positive negative control. A negative control is testing your analysis by running on random noise or on data that should have no signal in it. The latter was the approach used by Anders Eklund and Tom Nichols in their cluster failure paper series. A positive control is making sure that your analysis can detect VERY obvious things it should detect (e.g motor cortex activation following button presses, classify responses to auditory versus visual stimuli in V1, …). Jo Etzel has post about this.

https://jupyter4edu.github.io/jupyter-edu-book/

Other good habits:

a simple, transparent and systematic filenaming is a good start
if you have to deal with data in spreadsheet I think you will enjoy this paper and this cookbook

BIDS equivalent for psych data in general https://medium.freecodecamp.org/10-common-data-structures-explained-with-videos-exercises-aaff6c06fb2b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BeforeYouStart.md

BeforeYouStart.md

BEFORE YOU START

Reproducibility ( ??? )

Ethics and consent forms

Code and data management ( ??? )

Code management

Version control

Coding style

Avoid selective debugging: unit tests, positive and negative control

Files

BeforeYouStart.md

Latest commit

History

BeforeYouStart.md

File metadata and controls

BEFORE YOU START

Reproducibility ( ??? )

Ethics and consent forms

Code and data management ( ??? )

Code management

Version control

Coding style

Avoid selective debugging: unit tests, positive and negative control