This Snakemake workflow is for analysing genome re-sequencing experiments. It features 2 modes. The de-novo mode is used to confirm sample relationships from the raw sequencing reads with kwip and mash. The varcall mode performs read alignments to one or several reference genomes followed by variant detection. Read alignments can be performed with bwa and/or NextGenMap and variant calling with Freebayes and/or bcftools mpileup. These tools are currently the best performing tools when re-sequencing large plant genomes. Between read alignment and variant calling, PCR duplicates are flagged with samtools markdup and indels realigned with abra2. If a genome annotation is available, variants are annotated with snpEff.
- Norman Warthmann
- Marcos Conde
- Kevin Murray*
*Core functionality of this workflow is based on PaneucalyptShortReads
- Create a new github repository in your github account using this workflow as a template.
- Clone your newly created repository to your local system where you want to perform the analysis.
- Setup the software dependencies
- Configure the workflow for your needs and input files
- Run the workflow
- Archive your workflow for documenting your work and easy reproduction.
Some pointers for setup, configuring, and running the workflow are below, for details please consult the technical documentation.
An easy way to setup the dependencies is conda.
Get the Miniconda Python 3 distribution:
$ wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh
$ bash Miniconda3-latest-Linux-x86_64.sh
$ conda install mamba
Create an environment with the required software:
NOTE: conda's enviroment name in these examples is
dna-proto
.
$ mamba env create --file envs/all-dependencies.yml
Activate the environment:
$ conda activate dna-proto
Additional useful conda commands are here.
We provide scripts to list metadata and configuration parameters in utils/
.
python utils/check_metadata.py
python utils/check_config.py
You can check the workflow in graphical form by printing the so-called DAG.
snakemake --dag -npr -j -1 | dot -Tsvg > dag.svg
eog dag.svg
Prior to running the workflow, pretend a run and confirm it will do what is intended.
snakemake -npr
Main directory content:
.
├── envs
├── genomes_and_annotations
├── metadata
├── output
├── rules
├── scripts
├── utils
├── config.yml
├── Snakefile
├── snpEff.config
NOTE: the
output
directory and some files in themetadata
directory are/will be generated by the workflow.
You will need to configure the workflow for your specific project. For details see the technical documentation. Below files and directories will need editing:
- Snakefile
- genomes_and_annotations/
- metadata/
- config.yml
- snpEff.config
You can download example data for testing the workflow. click here to download
--
Fork this repository by clicking on the fork button on the top of this page. This will create a copy of this repository in your GitHub account (not in your computer).
Now clone the forked repository to your machine.
Go to your GitHub account, open the forked repository, click on the clone button and then click the copy to clipboard icon. The url is going to be like: https://github.com/your-username/dna-proto-workflow.git
where your-username
is your GitHub username.
Open a terminal and run the following git command:
git clone https://github.com/your-username/dna-proto-workflow.git
Once you've cloned your fork, you can edit your local copy. However, if you want to contribute, you'll need to create a new branch.
Change to the repository directory on your computer (if you are not already there):
NOTE: Don't change the name of this directory!
cd dna-proto-workflow
You can check your branches and active branch, using the git branch
command.
git branch -a
Now create a branch using the git checkout
command:
git checkout -b new-branch-name
For example:
git checkout -b development
From this point, you are in the new branch and edits only affect your branch. If things go wrong, simply remove your branch using
git branch -d name-of-the-branch
Or revert back to the master
-branch using
git checkout master
Once you've modified something, you can confirm that there are changes with git status
(called from the top-level directory).
Add those changes to your branch with git add
:
git status
git add .
or
git add name_of_the_file_you_modified
Commit those changes with git commit
:
git commit -m "write a message"
Push the changes in your local copy (on your machine) to your remote repository on GitHub with git push
:
git push origin your-branch-name
replacing your-branch-name
with the name of the branch you created earlier (e.g., development
).
In your repository on GitHub, klick the Compare & pull request
button.
Now submit the pull request and you'll see something like:
We'll get notified and can check your changes and merge them into this project (in general, into the master
branch).