GK Fragiadakis and the Students of BMS225A January 5th, 2021
The goal of this workshop is to lower the barrier of entry to biological data science, and to instill good practices as we do it. We will cover:
- Principles of reproducible research
- Intro to version control
- Data exploration and resources
Resources:
-
installing R: https://www.r-project.org/
-
installing RStudio: https://rstudio.com/products/rstudio/download/
-
Karl Browman's Tools for Reproducible Research and initial steps towards reproducible research
-
Git resources: Karl Browman's GitHub Guide and GitHub's guides
-
R package resources: Hilary Parker's guide (simplest), Karl Browman's R package primer, and Hadley Wickham's R package Book
Principles:
- Organize your data and code
- Everything with a script
- Use version control
- Turn repeated code into functions (and other good coding practices)
- Turn scripts into reproducible reports
- Package functions for future use
In this workshop we will cover an introduction to version control using git on GitHub.
- Create repo on GitHub
- clone repo locally
git clone repo-url
- locally create a branch
git checkout -b branch-name
- to see which branch you're on and what exists:
git branch
- to switch between branches:
git checkout branch-name
- make changes on that branch
- Add commits on that branch
git status
(will show you what files have changes and if they are staged)git add file-name
(staging your file)git commit -m "commit description"
- push that branch to GitHub: push commits every time you come to a stopping point (at least each day)
git push origin branch-name
- when ready, create pull request on GitHub
- review on GitHub
- merge branch to master
- delete branch
- then locally, pull down master
git pull origin master
- delete branch locally
git branch -d branch-name
- Run it back from step 3
Additional tips:
Make a .gitignore file with files to ignore by git:
touch .gitignore
- write in the names of files (or like *.pdf) you don't want to have appear
To see changes from the last commit:
git diff HEAD
To un-stage a file:
git reset name-of-file
Making a repository locally instead:
git init
Resources
- Hadley Wickham's R for Data Science
- Pre-process and tidy your data
- Explore your data using the Transform-Visualize-Model loop
- Communicate results
CyTOF resources:
- data pre-processing (normalization, debarcoding, sample cleanup): premessa
- gating tools: Cytobank CellEngine
scRNAseq resources:
- pre-processing and many analyses in Seurat vignettes
- some background reading on single-cell analysis
We covered a lot, now its time to try it on your own and to reach out if you have further questions as you're going.