-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy pathmerge_pair.html
175 lines (172 loc) · 7.16 KB
/
merge_pair.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html lang="en">
<head>
<meta content="en-us" http-equiv="Content-Language"/>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type"/>
<meta content="no-cache, no-store, must-revalidate" http-equiv="Cache-Control"/>
<meta content="no-cache" http-equiv="Pragma"/>
<meta content="0" http-equiv="Expires"/>
<title>
allpairs_global command
</title>
<link href="stylesx.css" rel="stylesheet" type="text/css"/>
<style type="text/css">
body.c4 {background-color:#c0c0c0;}
div.c3 {position:absolute; top:45px; left:20px; width:830px; background-color:#ffffff; border-width:10px; border-style:solid;border-color:white;}
span.c2 {font-weight: bold}
div.c1 {position:absolute; top:10px; left:20px; width:850px; height:60px;}
.TopButtonPara { color:white; background-color:rgb(50,100,150); border-color:rgb(50,100,150); font-family:Arial, Helvetica, sans-serif; font-weight:normal; font-size:9pt; text-align:center; border-width:4px; border-style:solid; }
.TopButton { color:white; }
a.TopButton:link { text-decoration:none; }
a.TopButton:visited { text-decoration:none; }
a.TopButton:hover { color:orange; }
.NewButtonPara { color:white; background-color:rgb(50,100,150); border-color:rgb(50,100,150); font-family:Arial, Helvetica, sans-serif; font-weight:normal; font-size:9pt; text-align:center; border-width:4px; border-style:solid; }
.NewButton { color:white; }
a.NewButton:link { text-decoration:none; }
a.NewButton:visited { text-decoration:none; }
a.NewButton:hover { color:orange; }
.SideButtonPara { color:white; font-family:Arial, Helvetica, sans-serif; font-size:9pt; font-weight:normal; text-align:center; line-height:18px; }
.SideButton { color:white; }
a.SideButton:link { text-decoration:none; }
a.SideButton:visited { text-decoration:none; }
a.SideButton:hover { color:orange; }
</style>
</head>
<body style="background-color:#c0c0c0;">
<div>
<a href="https://drive5.com/usearch">
<img alt="USEARCH v12" src="usearch12_banner.jpg" style="position:absolute; top:40px; left:10px; padding:0px; border:0px;"/>
</a>
</div>
<div style="position:absolute; top:115px; left:10px; width:850px; background-color:#ffffff; min-height:500px">
<div style="position:relative; float:left; background-color:#696969; width:125px; left: 0px; min-height:500px; padding:5px; height: 125px;">
<div class="SideButtonPara" style="text-align:center; padding-top:5px;">
<a class="SideButton" href="index.html">
Docs home
</a>
<br/>
<hr style="border:0; border-bottom: 1px solid white;"/>
<a class="SideButton" href="cmds.html">
Commands
</a>
<br/>
<a class="SideButton" href="topics.html">
Topics
</a>
<br/>
<a class="SideButton" href="citation.html">
Publications
</a>
<br/>
</div>
</div>
<div class="ManText" style="left:20px; position: absolute; left:135px; width:695px; background-color:white; padding:10px">
<h1>
Merging (assembly) of paired reads
</h1>
<span class="ManText">
<b>
See also
<br/>
</b>
<a href="cmd_fastq_mergepairs.html">
fastq_mergepairs command
</a>
<br/>
<b>
</b>
<a href="merge_bench.html">
Paired read assembler benchmark results
</a>
<br/>
<b>
</b>
<a href="citation.html">
Edgar & Flyvbjerg 2014 paper on expected errors and
Bayesianassemblyassembly
</a>
<b>
<br/>
</b>
<a href="fastq_files.html">
FASTQ files
</a>
<br/>
<a href="quality_score.html">
Quality scores
</a>
<br/>
<a href="merge_stagger.html">
Staggered pairs
</a>
</span>
<p class="ManText">
<span class="ManIndText ManText">
<strong>
Paired read merging (assembly)
<br/>
</strong>
Illumina machine can generate paired reads where constructs are sequenced in
both directions.
</span>
</p>
<p class="ManText">
<span class="ManIndText ManText">
When the forward and reverse reads overlap, they can be "assembled" or "merged" to give a single sequence. I prefer to call this merging rather than assembly to distinguish from whole-genome assembly using shotgun reads. People who use the term assemblers call the result of merging a pair a "contig", I call it a "consensus sequence".
<br/>
<br/>
<strong>
Consensus sequence
<br/>
</strong>
Paired read mergers generate consensus sequences by aligning the forward and reverse reads and resolving any mismatches found in the alignment. The reverse read is reverse-complemented so that its sequence is oriented on the same strand as the forward read. Usually, the alignment covers part of the forward and reverse reads leaving unaligned segments at the beginning of both reads. However, if the sequencing construct is shorter than the read length then the
<a href="merge_stagger.html">
alignment is staggered
</a>
with unaligned segments at the ends rather than the beginning of the reads. By default, the fastq_mergepairs command trims (discards) the unaligned segments if the alignment is staggered.
<br/>
<br/>
<img alt="Image" class="auto-style2" src="merge.gif"/>
<br/>
<br/>
<strong>
Posterior quality scores
</strong>
<br/>
If the base calls match, the Q score should increase. If there is a mismatch, the Q score should be usually be reduced, but not always. The Q score at a mismatch position may stay the same or even increase if one of the reads has a low enough Q score that the base call is predicted to be wrong, e.g. Q=2 which indicates an error probability of 63.1%. The error probabilities after merging are called posterior probabilities, and the corresponding Q scores are called posterior Qs. The term "posterior" comes from Bayesian statistics, meaning after new evidence has been taken into account. The equations for calculating posterior Qs are given in
<a href="citation.html">
Edgar & Flyvbjerg (2014)
</a>
.
</span>
</p>
<p class="ManText">
Calculating correct posterior Qs is important because it enables an improved
<a href="exp_errs.html">
estimate of the number of errors
</a>
in the read.
</p>
<p class="ManText">
<span class="ManIndText">
<img alt="Image" border="1" src="overlapper.jpg"/>
</span>
</p>
<p class="ManText">
<span class="ManIndText">
<strong>
Other programs calculate incorrect posterior Q scores
<br/>
</strong>
Most assemblers generate posterior Q scores which are obviously wrong in at least some situations (see
<a href="merge_bench.html">
benchmark results
</a>
). The best (i.e., worst) example is PANDAseq, which reports lower Q scores in most aligned positions even if there is agreement between the two reads (tested through v2.8).
<br/>
</span>
</p>
</div>
</div>
</body>
</html>