Skip to content

Commit

Permalink
The version to appear in arXiv as the 2nd version
Browse files Browse the repository at this point in the history
The supplementary is for a record only. Some of the content has been moved back
to the preprint.
  • Loading branch information
lh3 committed May 26, 2013
1 parent 474a43e commit af132e4
Show file tree
Hide file tree
Showing 8 changed files with 209 additions and 83 deletions.
4 changes: 2 additions & 2 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,9 @@
.eps.gz.pdf:
gzip -dc $< | epstopdf --filter > $@

all:bwamem.pdf mem-supp.pdf
all:bwamem.pdf

bwamem.pdf:bwamem.tex bwamem.bib alnroc-se.pdf alnroc-pe.pdf
bwamem.pdf:bwamem.tex bwamem.bib alnroc-color-se.pdf alnroc-color-pe.pdf
pdflatex bwamem; bibtex bwamem; pdflatex bwamem; pdflatex bwamem;

mem-supp.pdf:mem-supp.tex alnroc-color-se.pdf alnroc-color-pe.pdf
Expand Down
10 changes: 5 additions & 5 deletions bioinfo.cls
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
\newcommand\classname{bioinfo}
\newcommand\classname{bioinfo2}
\newcommand\lastmodifieddate{2003/02/08}
\newcommand\versionnumber{0.1}

Expand Down Expand Up @@ -454,7 +454,7 @@
\vspace*{\aboveskipchk}%
\vspace{\dropfromtop}%
\hbox to \textwidth{%
{\helvetica\itshape\bfseries\fontsize{19}{12}\selectfont {\color{gray}BIOINFORMATICS}
{\helvetica\itshape\bfseries\fontsize{19}{12}\selectfont {\color{gray}PREPRINT}
\hfil
\if@appnotes APPLICATIONS NOTE\hfil\fi
}%
Expand All @@ -475,9 +475,9 @@
\vspace{4\p@}
{\helvetica\fontsize{10}{12}\selectfont\raggedright \@address \par}%
\vspace{4\p@}
{\helvetica\fontsize{8}{10}\selectfont\raggedright \@history \par}
\vspace{24\p@}
{\helvetica\fontsize{10}{12}\selectfont\raggedright \@editor \par}
%{\helvetica\fontsize{8}{10}\selectfont\raggedright \@history \par}
%\vspace{24\p@}
%{\helvetica\fontsize{10}{12}\selectfont\raggedright \@editor \par}
%\vspace{20\p@}
}%
}
Expand Down
49 changes: 27 additions & 22 deletions bwamem.tex
Original file line number Diff line number Diff line change
Expand Up @@ -124,23 +124,27 @@ \subsubsection{Seed extension} We rank a seed by the length of the chain it

\begin{figure}[tb]
\centering
\includegraphics[width=0.4\textwidth]{alnroc-se}\\
\includegraphics[width=0.4\textwidth]{alnroc-pe}
\includegraphics[width=0.5\textwidth]{alnroc-color-se}\\
\includegraphics[width=0.5\textwidth]{alnroc-color-pe}
\caption{Percent mapped reads as a function of the false alignment rate under
different mapping quality cutoff. Alignments with mapping quality 3 or lower
are excluded. An alignment is \emph{wrong} if after correcting clipping, its
start position is at least 5bp away from the simulated position. $10^6$ pairs
start position is within 20bp from the simulated position. $10^6$ pairs
of 101bp reads are simulated from the human reference genome using wgsim
(http://bit.ly/wgsim2) with 1.5\% substitution errors and 0.2\% indel variants.
The insert size follows a normal distribution $N(500,50^2)$. The reads are
aligned back to the genome either as single end (SE; top panel) or as paired
end (PE; bottom panel). GEM is configured to allow up to 5 gaps and to output
suboptimal alignments (option `--e5 --m5 --s1' for SE and `--e5 --m5 --s1 --pb'
for PE). GEM does not compute mapping quality. Its mapping quality is estimated
with a BWA-like algorithm with suboptimal
alignments available. Other mappers are run with the default setting except
for specifying the insert size distribution. The run time in seconds on a
single CPU core is shown in the parentheses.}\label{fig:eval}
with a BWA-like algorithm with suboptimal alignments available. Other mappers
are run with the default setting except for specifying the insert size
distribution.
%GSNAP and STAR essentially do not report mapping quality, so they are not shown in the plot.
%For SE, GSNAP's accuracy is placed at $(2.8\times10^{-3},96.1\%)$ in the plot and STAR at $(4.6\times10^{-3},95.5\%)$.
%For PE, GSNAP achieves $(1.1\times10^{-3},97.8\%)$ accuracy and STAR $(2.0\times10^{-3},96.8\%)$.
The run time in seconds on a single CPU core is shown in the
parentheses.}\label{fig:eval}
\end{figure}

\subsection{Paired-end mapping}
Expand Down Expand Up @@ -177,10 +181,11 @@ \subsubsection{Pairing} Given the $i$-th hit for the first read and $j$-th hit
\section{Results and Discussions}

We evaluated the performance of BWA-MEM on simulated data together with
NovoAlign (http://novocraft.com), GEM, Bowtie2, Cushaw2, BWA-SW and BWA
(Figure~\ref{fig:eval}). On accuracy, NovoAlign is the best. BWA-MEM is close
to NovoAlign for PE reads and is comparable to GEM and Cushaw2 for SE. On
speed, BWA-MEM is similar to GEM and Bowtie2 for this data set, but is about
NovoAlign (http://novocraft.com), GEM, Bowtie2, Cushaw2, SeqAlto~\citep{Mu:2012fk}, %GSNAP~\citep{Wu:2010uq}, STAR~\citep{Dobin:2013kx},
BWA-SW
and BWA (Figure~\ref{fig:eval}). On accuracy, NovoAlign is the best. BWA-MEM
is close to NovoAlign for PE reads and is comparable to GEM and Cushaw2 for SE.
On speed, BWA-MEM is similar to GEM and Bowtie2 for this data set, but is about
6 times as fast as Bowtie2 and Cushaw2 for a 650bp long-read data set.

We speculate BWA-MEM is more performant for longer reads firstly because of its
Expand All @@ -191,17 +196,17 @@ \section{Results and Discussions}
alignment but lacked in most NGS long-read mappers. Our earlier
paper~\citep{Li:2010fk} discussed this in details.

%To evaluate the performance for even longer query sequences, we aligned the
%{\it E. coli} K-12 MG1655 substrain (AC:NC\_000913; 4.6Mb in length) against
%the 536 strain (AC:NC\_008253) with both BWA-MEM and
%nucmer~\citep{Kurtz:2004zr}. Nucmer finished the alignment in 25 seconds,
%giving 105,505 substitution differences between the strains. BWA-MEM is slower.
%It finished the alignment in 131 seconds, reporting 104,321 substitutions of
%which 102,241 overlapping with the nucmer output. Manual examination of the
%substitutions unique to each aligner reveals that most of them fall in short
%regions with very high divergence. It is unclear which aligner is better. Note
%that although nucmer is faster in the evaluation, only BWA-MEM scales well to
%large genomes.
To evaluate the performance for even longer query sequences, we aligned the
{\it E. coli} K-12 MG1655 substrain (AC:NC\_000913; 4.6Mb in length) against
the 536 strain (AC:NC\_008253) with both BWA-MEM and
nucmer~\citep{Kurtz:2004zr}. Nucmer finished the alignment in 25 seconds,
giving 105,505 substitution differences between the strains. BWA-MEM is slower.
It finished the alignment in 131 seconds, reporting 104,321 substitutions of
which 102,241 overlapping with the nucmer output. Manual examination of the
substitutions unique to each aligner reveals that most of them fall in short
regions with very high divergence. It is unclear which aligner is better. Note
that although nucmer is faster in the evaluation, only BWA-MEM scales well to
large genomes.

BWA-MEM is a fast and accurate aligner for sequence reads and is one of the few
that work well for both 70bp reads and long sequences up to a few megabases.
Expand Down
89 changes: 39 additions & 50 deletions eval/plot-color.gp
Original file line number Diff line number Diff line change
Expand Up @@ -6,78 +6,67 @@ set log x;
set format x "10^{%L}"
set yran [84:100]

set pointsize 1.5

set t po eps co enhance "Helvetica,16"
set t po eps co enhance "Helvetica,17"

set style line 1 lt 1 lc rgb "#FF0000" lw 3;
set style line 2 lt 1 lc rgb "#00C000" lw 3;
set style line 3 lt 1 lc rgb "#0080FF" lw 3;
set style line 4 lt 1 lc rgb "#C000FF" lw 3;
set style line 5 lt 1 lc rgb "#00EEEE" lw 3;
set style line 6 lt 1 lc rgb "#C04000" lw 3;
set style line 7 lt 1 lc rgb "#C8C800" lw 3;
set style line 8 lt 1 lc rgb "#FF80FF" lw 3;
set style line 9 lt 1 lc rgb "#4E642E" lw 3;
set style line 10 lt 1 lc rgb "#800000" lw 3;
set style line 11 lt 1 lc rgb "#67B7F7" lw 3;
set style line 12 lt 1 lc rgb "#FFC127" lw 3;
set style line 13 lt 1 lc rgb "#000000" lw 3;
set style line 14 lt 1 lc rgb "#9F0000" lw 3;

set style line 31 lt 2 lc rgb "#FF0000" lw 3;
set style line 32 lt 2 lc rgb "#00C000" lw 3;
set style line 33 lt 2 lc rgb "#0080FF" lw 3;
set style line 34 lt 2 lc rgb "#C000FF" lw 3;
set style line 35 lt 2 lc rgb "#00EEEE" lw 3;
set style line 36 lt 2 lc rgb "#C04000" lw 3;
set style line 37 lt 2 lc rgb "#C8C800" lw 3;
set style line 38 lt 2 lc rgb "#FF80FF" lw 3;
set style line 39 lt 2 lc rgb "#4E642E" lw 3;
set style line 40 lt 2 lc rgb "#800000" lw 3;
set style line 41 lt 2 lc rgb "#67B7F7" lw 3;
set style line 42 lt 2 lc rgb "#FFC127" lw 3;
set style line 1 lt 1 pt 1 lc rgb "#FF0000" lw 3;
set style line 2 lt 1 pt 2 lc rgb "#00C000" lw 3;
set style line 3 lt 1 pt 3 lc rgb "#0080FF" lw 3;
set style line 4 lt 1 pt 4 lc rgb "#C000FF" lw 3;
set style line 5 lt 1 pt 5 lc rgb "#00EEEE" lw 3;
set style line 6 lt 1 pt 6 lc rgb "#C04000" lw 3;
set style line 7 lt 1 pt 7 lc rgb "#C8C800" lw 3;
set style line 8 lt 1 pt 8 lc rgb "#FF80FF" lw 3;
set style line 9 lt 1 pt 9 lc rgb "#4E642E" lw 3;
set style line 10 lt 1 pt 10 lc rgb "#800000" lw 3;
set style line 11 lt 1 pt 11 lc rgb "#67B7F7" lw 3;
set style line 12 lt 1 pt 12 lc rgb "#FFC127" lw 3;
set style line 13 lt 1 pt 13 lc rgb "#000000" lw 3;
set style line 14 lt 1 pt 14 lc rgb "#9F0000" lw 3;

set out "alnroc-color-pe.eps"
set key bot right box width -3 font "Helvetica,16"
set key bot right box width -3 font "Helvetica,17"

set xlab "#wrong mappings / #mapped (10^6 {/Symbol \264} 2{/Symbol \264}101bp PE, 1.5% sub, 0.2% indel)"
set label "star (<135s)" at 0.0015,96.3 front
set label "gsnap (3514s)" at 0.0008,98.4 front tc rgb "#9F0000"

#set label "star (<135s)" at 0.0015,96.3 front
#set label "gsnap (3514s)" at 0.0008,98.4 front tc rgb "#9F0000"
#"<echo 255 1936412 3793" u ($3/$2):($2/20000) t '' w p ls 13 pt 5, \
#"<echo 255 1955615 2213" u ($3/$2):($2/20000) t '' w p ls 14 pt 5
#"<awk '$3' r12-pe.mem-fast.eval" u ($3/$2):($2/20000) t "mem-fast (286s)" w lp ls 10 lw 2,
#"<awk '$1>3&&$3' r12-pe.smalt.eval" u ($3/$2):($2/20000) t "smalt (1705s)" w lp ls 10 lw 2,
#"<awk '$3' r12-pe.last.eval" u ($3/$2):($2/20000) t "last (5386s)" w lp ls 8 lw 2

plot "<awk '$3' r12-pe.mem.eval" u ($3/$2):($2/20000) t "bwa-mem (497s)" w lp ls 1 lw 2, \
"<awk '$3' r12-pe.gem-bp-e5-s1.eval" u ($3/$2):($2/20000) t "gem (529s)" w lp ls 7 lw 2, \
"<awk '$3' r12-pe.gem-bp-e5-s1.eval" u ($3/$2):($2/20000) t "gem (529s)" w lp ls 8 lw 2, \
"<awk '$1>1&&$3' r12-pe.bt2.eval" u ($3/$2):($2/20000) t "bowtie2 (545s)" w lp ls 2 lw 2, \
"<awk '$3' r12-pe.alto.eval" u ($3/$2):($2/20000) t "seqalto (879s)" w lp ls 12 lw 2, \
"<awk '$3' r12-pe.cushaw.eval" u ($3/$2):($2/20000) t "cushaw2 (1026s)" w lp ls 3 lw 2, \
"<awk '$3' r12-pe.bwasw.eval" u ($3/$2):($2/20000) t "bwa-sw (1043s)" w lp ls 4 lw 2, \
"<awk '$3' r12-pe.bwa.eval" u ($3/$2):($2/20000) t "bwa (1092s)" w lp ls 9 lw 2, \
"<awk '$1>3&&$3' r12-pe.smalt.eval" u ($3/$2):($2/20000) t "smalt (1705s)" w lp ls 10 lw 2, \
"<awk '$1>3&&$3' r12-pe.novo.eval" u ($3/$2):($2/20000) t "novoalign (2585s)" w lp ls 6 lw 2, \
"<awk '$3' r12-pe.last.eval" u ($3/$2):($2/20000) t "last (5386s)" w lp ls 8 lw 2, \
"<echo 255 1936412 3793" u ($3/$2):($2/20000) t '' w p ls 13 pt 5, \
"<echo 255 1955615 2213" u ($3/$2):($2/20000) t '' w p ls 14 pt 5
unset label
"<awk '$1>3&&$3' r12-pe.novo.eval" u ($3/$2):($2/20000) t "novoalign (2585s)" w lp ls 6 lw 2

set out "alnroc-color-se.eps"
set key top left font "Helvetica,16" width -3

set xlab "#wrong mappings / #mapped (2{/Symbol \264}10^6 {/Symbol \264} 101bp SE, 1.5% sub, 0.2% indel)"
set xlab "#wrong / #mapped (2{/Symbol \264}10^6 {/Symbol \264} 101bp SE, 1.5% sub, 0.2% indel; correct = within 20bp)"

#set label "star (<128s)" at 0.0015,96.2 front
set xran [1e-7:1e-2]
set label "star (<128s)" at 0.0015,94.9 front
set label "gsnap (5553s)" at 0.0013,96.7 front tc rgb "#9F0000"
plot "<awk '$3' r12-se.sr.eval" u ($3/$2):($2/20000) t "subread (133s)" w lp ls 11 lw 2, \
"<awk '$3' r12-se.gem-e5.eval" u ($3/$2):($2/20000) t "gem (426s)" w lp ls 7 lw 2, \
#set label "star (<128s)" at 0.0015,94.9 front
#set label "gsnap (5553s)" at 0.0013,96.7 front tc rgb "#9F0000"
#"<echo 255 1909382 8711" u ($3/$2):($2/20000) t '' w p ls 13 pt 5, \
#"<echo 255 1922513 5432" u ($3/$2):($2/20000) t '' w p ls 14 pt 5
#"<awk '$3' r12-se.sr.eval" u ($3/$2):($2/20000) t "subread (101s)" w lp ls 11 lw 2,
#"<awk '$1>3&&$3' r12-se.smalt.eval" u ($3/$2):($2/20000) t "smalt (3859s)" w lp ls 10 lw 2, \
#"<awk '$3' r12-se.last.eval" u ($3/$2):($2/20000) t "last (4302s)" w lp ls 8 lw 2

plot "<awk '$3' r12-se.gem-e5.eval" u ($3/$2):($2/20000) t "gem (426s)" w lp ls 8 lw 2, \
"<awk '$3' r12-se.mem.eval" u ($3/$2):($2/20000) t "bwa-mem (467s)" w lp ls 1 lw 2, \
"<awk '$1>1&&$3' r12-se.bt2.eval" u ($3/$2):($2/20000) t "bowtie2 (582s)" w lp ls 2 lw 2, \
"<awk '$3' r12-se.alto.eval" u ($3/$2):($2/20000) t "seqalto (850s)" w lp ls 12 lw 2, \
"<awk '$3' r12-se.cushaw.eval" u ($3/$2):($2/20000) t "cushaw2 (967s)" w lp ls 3 lw 2, \
"<awk '$3' r12-se.bwasw.eval" u ($3/$2):($2/20000) t "bwa-sw (1002s)" w lp ls 4 lw 2, \
"<awk '$3' r12-se.bwa.eval" u ($3/$2):($2/20000) t "bwa (1055s)" w lp ls 9 lw 2, \
"<awk '$1>3&&$3' r12-se.smalt.eval" u ($3/$2):($2/20000) t "smalt (3859s)" w lp ls 10 lw 2, \
"<awk '$1>3&&$3' r12-se.novo.eval" u ($3/$2):($2/20000) t "novoalign (3960s)" w lp ls 6 lw 2, \
"<awk '$3' r12-se.last.eval" u ($3/$2):($2/20000) t "last (4302s)" w lp ls 8 lw 2, \
"<echo 255 1909382 8711" u ($3/$2):($2/20000) t '' w p ls 13 pt 5, \
"<echo 255 1922513 5432" u ($3/$2):($2/20000) t '' w p ls 14 pt 5


"<awk '$1>3&&$3' r12-se.novo.eval" u ($3/$2):($2/20000) t "novoalign (3960s)" w lp ls 6 lw 2
6 changes: 2 additions & 4 deletions eval/plot.gp
Original file line number Diff line number Diff line change
Expand Up @@ -14,30 +14,28 @@ set t po eps mono enhance "Helvetica,18"
set out "alnroc-pe.eps"
set key bot right box width -3 font "Helvetica,17"

set xlab "#wrong alignments / #aligned (10^6 {/Symbol \264} 2{/Symbol \264}101bp PE, 1.5% sub, 0.2% indel)"
set xlab "#wrong mappings / #mapped (10^6 {/Symbol \264} 2{/Symbol \264}101bp PE, 1.5% sub, 0.2% indel)"
plot "<awk '$3' r12-pe.mem.eval" u ($3/$2):($2/20000) t "bwa-mem (497s)" w lp ls 1 lw 2, \
"<awk '$3' r12-pe.gem-bp-e5-s1.eval" u ($3/$2):($2/20000) t "gem (529s)" w lp ls 7 lw 2, \
"<awk '$1>1&&$3' r12-pe.bt2.eval" u ($3/$2):($2/20000) t "bowtie2 (545s)" w lp ls 2 lw 2, \
"<awk '$3' r12-pe.cushaw.eval" u ($3/$2):($2/20000) t "cushaw2 (1026s)" w lp ls 3 lw 2, \
"<awk '$3' r12-pe.bwasw.eval" u ($3/$2):($2/20000) t "bwa-sw (1043s)" w lp ls 4 lw 2, \
"<awk '$3' r12-pe.bwa.eval" u ($3/$2):($2/20000) t "bwa (1092s)" w lp ls 8 lw 2, \
"<awk '$1>3&&$3' r12-pe.novo.eval" u ($3/$2):($2/20000) t "novoalign (2585s)" w lp ls 6 lw 2
# "<echo 255 1936412 3793" u ($3/$2):($2/20000) t '' w p ls 5

# "<awk '$3' r12-pe.last.eval" u ($3/$2):($2/20000) t "last (5386s)" w lp ls 8 lw 2

set out "alnroc-se.eps"
set key top left font "Helvetica,17" width -4

set xlab "#wrong alignments / #aligned (2{/Symbol \264}10^6 {/Symbol \264} 101bp SE, 1.5% sub, 0.2% indel)"
set xlab "#wrong mappings / #mapped (2{/Symbol \264}10^6 {/Symbol \264} 101bp SE, 1.5% sub, 0.2% indel)"
plot "<awk '$3' r12-se.gem-e5.eval" u ($3/$2):($2/20000) t "gem (426s)" w lp ls 7 lw 2, \
"<awk '$3' r12-se.mem.eval" u ($3/$2):($2/20000) t "bwa-mem (467s)" w lp ls 1 lw 2, \
"<awk '$1>1&&$3' r12-se.bt2.eval" u ($3/$2):($2/20000) t "bowtie2 (582s)" w lp ls 2 lw 2, \
"<awk '$3' r12-se.cushaw.eval" u ($3/$2):($2/20000) t "cushaw2 (967s)" w lp ls 3 lw 2, \
"<awk '$3' r12-se.bwasw.eval" u ($3/$2):($2/20000) t "bwa-sw (1002s)" w lp ls 4 lw 2, \
"<awk '$3' r12-se.bwa.eval" u ($3/$2):($2/20000) t "bwa (1055s)" w lp ls 8 lw 2, \
"<awk '$1>3&&$3' r12-se.novo.eval" u ($3/$2):($2/20000) t "novoalign (3960s)" w lp ls 6 lw 2
# "<echo 255 1909382 8711" u ($3/$2):($2/20000) t '' w p ls 5

# "<awk '$3' r12-se.last.eval" u ($3/$2):($2/20000) t "last (4302s)" w lp ls 8 lw 2

Expand Down
Binary file added mem-flow.pdf
Binary file not shown.
Binary file added mem-flow.ppt
Binary file not shown.
Loading

0 comments on commit af132e4

Please sign in to comment.