-
Notifications
You must be signed in to change notification settings - Fork 10
How to run Polypolish
bwa index draft.fasta
bwa mem -t 16 -a draft.fasta reads_1.fastq.gz > alignments_1.sam
bwa mem -t 16 -a draft.fasta reads_2.fastq.gz > alignments_2.sam
polypolish draft.fasta alignments_1.sam alignments_2.sam > polished.fasta
Most short-read sets are paired-end, so this will be the most common way to run Polypolish.
In these commands, I will assume your files are named like this:
-
draft.fasta
: the input assembly that you want to polish -
reads_1.fastq.gz
: short reads (first in pair) -
reads_2.fastq.gz
: short reads (second in pair)
The first step is to align the reads to your draft genome with BWA:
bwa index draft.fasta
bwa mem -t 16 -a draft.fasta reads_1.fastq.gz > alignments_1.sam
bwa mem -t 16 -a draft.fasta reads_2.fastq.gz > alignments_2.sam
Important things to note here:
- The
-a
option is key! This makes BWA align each read to all possible locations, not just the best location. Polypolish needs this to polish repeat sequences. - You have to align the two read files separately. I.e. don't align both read files with a single
bwa mem
command. This is because BWA's-a
option doesn't work with paired-end alignment. - I used
-t 16
in these commands to align using 16 threads. Adjust this value as is appropriate for your system. This will only affect the speed performance (the alignments themselves are unaffected by thread count).
Then give the draft genome and the alignments to Polypolish. It will output information to stderr and the polished assembly to stdout, so redirect its output to a file:
polypolish draft.fasta alignments_1.sam alignments_2.sam > polished.fasta
Polypolish works with any number of SAM files, so unpaired reads are easy:
bwa index draft.fasta
bwa mem -t 16 -a draft.fasta reads.fastq.gz > alignments.sam
polypolish draft.fasta alignments.sam > polished.fasta
And multiple paired-end read sets work too:
bwa index draft.fasta
bwa mem -t 16 -a draft.fasta reads_a_1.fastq.gz > alignments_a_1.sam
bwa mem -t 16 -a draft.fasta reads_a_2.fastq.gz > alignments_a_2.sam
bwa mem -t 16 -a draft.fasta reads_b_1.fastq.gz > alignments_b_1.sam
bwa mem -t 16 -a draft.fasta reads_b_2.fastq.gz > alignments_b_2.sam
polypolish draft.fasta alignments_*.sam > polished.fasta
Polypolish has a --debug
option which you can use to save per-base polishing information in a tab-delimited format. You can use it like this:
polypolish --debug polished.tsv draft.fasta alignments_1.sam alignments_2.sam > polished.fasta
The resulting file will have one line for each base of the assembly, so it will be hundreds of megabytes in size for a typical bacterial genome. See the Toy example page for an example of what this file looks like.
If you run polypolish --help
, you will see this:
_____ _ _ _ _
| __ \ | | | |(_) | |
| |__) |___ | | _ _ _ __ ___ | | _ ___ | |__
| ___// _ \ | || | | || '_ \ / _ \ | || |/ __|| '_ \
| | | (_) || || |_| || |_) || (_) || || |\__ \| | | |
|_| \___/ |_| \__, || .__/ \___/ |_||_||___/|_| |_|
__/ || |
|___/ |_|
Polypolish v0.4.2
short-read polishing of long-read assemblies
github.com/rrwick/Polypolish
USAGE:
polypolish [OPTIONS] <assembly> <sam>...
ARGS:
<assembly> Assembly to polish (FASTA format, one file)
<sam>... Short read alignments (SAM format, one or more files)
FLAGS:
-h, --help Prints help information
-V, --version Prints version information
OPTIONS:
--debug <debug> Optional file to store per-base information for debugging
purposes
-m, --max_errors <max-errors> Ignore alignments with more than this many mismatches and
indels [default: 10]
-d, --min_depth <min-depth> A base must occur at least this many times in the pileup to be
considered valid [default: 5]
-f, --min_fraction <min-fraction> A base must make up at least this fraction of the read depth to
be considered valid [default: 0.5]
-r, --min_ratio <min-ratio> A base will only be changed if the best option occurs this many
times more often than the second-best option [default: 2.0]