- Reference genome sequence file
- Reference genome annotation file (in
.gtf
format) - RNA-seq fastq files
Gene expression count matrix.
├── script
│ └── snake_pipeline
├── raw_data
├── genome_index
└── logs
Please storage your resequence data in raw_data/
folder and genome file in genome_index/
folder. Script files, pipeline files and configuration files can be stored in the way you are used to.
The config file needs to be at the same folder of snakefile.
# Absolute path to the genome fasta file
ref: "/workingdir/genome_index/genome.fasta"
2.2 Sometimes the fastq files may be ended with .fastq.gz
or .fq.gz
, specify the suffix of the fastq files if it's necessary.
# Fastq file suffix
fastq_suffix: " " # Default value is ".fq.gz"
# Sample list, samples' name should start with letters.
sample:
- "sample1"
- "sample2"
- "sample3"
- "sample4"
- ...
- "samplen"
You can use following command to add sample list to the config file if you have a sample list txt file (for example sample.list
):
# sample.list
sample1
sample2
sample3
sample4
# Add samples to the config file:
awk '{print " - \"" $0 "\""}' sample.list >> ${working_dir}/SNPcalling_config.yaml
For example:
snakemake \
--snakefile ${snakefile} \
-d ${working_dir} \
--cores ${cores_num}