This repository has been archived by the owner on Dec 13, 2017. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 29
/
Copy pathminimap.1
222 lines (192 loc) · 5.34 KB
/
minimap.1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
.TH minimap 1 "06 December 2015" "minimap-0.2" "Bioinformatics tools"
.SH NAME
.PP
minimap - fast mapping between long DNA sequences
.SH SYNOPSIS
.PP
minimap
.RB [ -lSOV ]
.RB [ -k
.IR kmer ]
.RB [ -w
.IR winSize ]
.RB [ -I
.IR batchSize ]
.RB [ -d
.IR dumpFile ]
.RB [ -f
.IR occThres ]
.RB [ -r
.IR bandWidth ]
.RB [ -m
.IR minShared ]
.RB [ -c
.IR minCount ]
.RB [ -L
.IR minMatch ]
.RB [ -g
.IR maxGap ]
.RB [ -T
.IR dustThres ]
.RB [ -t
.IR nThreads ]
.RB [ -x
.IR preset ]
.I target.fa
.I query.fa
>
.I output.paf
.SH DESCRIPTION
.PP
Minimap is a tool to efficiently find multiple approximate mapping positions
between two sets of long sequences, such as between reads and reference
genomes, between genomes and between long noisy reads. Minimap has an indexing
and a mapping phase. In the indexing phase, it collects all minimizers of a
large batch of target sequences in a hash table; in the mapping phase, it
identifies good clusters of colinear minimizer hits. Minimap does not generate
detailed alignments between the target and the query sequences. It only outputs
the approximate start and the end coordinates of these clusters.
.SH OPTIONS
.SS Indexing options
.TP 10
.BI -k \ INT
Minimizer k-mer length [15]
.TP
.BI -w \ INT
Minimizer window size [2/3 of k-mer length]. A minimizer is the smallest k-mer
in a window of w consecutive k-mers.
.TP
.BI -I \ NUM
Load at most
.I NUM
target bases into RAM for indexing [4G]. If there are more than
.I NUM
bases in
.IR target.fa ,
minimap needs to read
.I query.fa
multiple times to map it against each batch of target sequences.
.I NUM
may be ending with k/K/m/M/g/G.
.TP
.BI -d \ FILE
Dump minimizer index to
.I FILE
[no dump]
.TP
.B -l
Indicate that
.I target.fa
is in fact a minimizer index generated by option
.BR -d ,
not a FASTA or FASTQ file.
.SS Mapping options
.TP 10
.BI -f \ FLOAT
Ignore top
.I FLOAT
fraction of most occurring minimizers [0.001]
.TP
.BI -r \ INT
Approximate bandwidth for initial minimizer hits clustering [500]. A
.I minimizer hit
is a minimizer present in both the target and query sequences. A
.I minimizer hit cluster
is a group of potentially colinear minimizer hits between a target and a query
sequence.
.TP
.BI -m \ FLOAT
Merge initial minimizer hit clusters if
.I FLOAT
or higher fraction of minimizers are shared between the clusters [0.5]
.TP
.BI -c \ INT
Retain a minimizer hit cluster if it contains
.I INT
or more minimizer hits [4]
.TP
.BI -L \ INT
Discard a minimizer hit cluster if after colinearization, the number of matching bases is below
.I INT
[40]. This option mainly reduces the size of output. It has little effect on
the speed and peak memory.
.TP
.BI -g \ INT
Split a minimizer hit cluster at a gap
.IR INT -bp
or longer that does not contain any minimizer hits [10000]
.TP
.BI -T \ INT
Mask regions on query sequences with SDUST score threshold
.IR INT ;
0 to disable [0]. SDUST is an algorithm
to identify low-complexity subsequences. It is not enabled by default. If SDUST
is preferred, a value between 20 and 25 is recommended. A higher threshold masks
less sequences.
.TP
.B -S
Perform all-vs-all mapping. In this mode, if the query sequence name is
lexicographically larger than the target sequence name, the hits between them
will be suppressed; if the query sequence name is the same as the target name,
diagonal minimizer hits will also be suppressed.
.TP
.B -O
Drop a minimizer hit if it is far away from other hits (EXPERIMENTAL). This
option is useful for mapping long chromosomes from two diverged species.
.TP
.BI -x \ STR
Changing multiple settings based on
.I STR
[not set]. It is recommended to apply this option before other options, such
that the following options may override the multiple settings modified by this
option.
.RS
.TP 8
.B ava10k
for PacBio or Oxford Nanopore all-vs-all read mapping (-Sw5 -L100 -m0).
.RE
.SS Input/output options
.TP 10
.BI -t \ INT
Number of threads [3]. Minimap uses at most three threads when collecting
minimizers on target sequences, and uses up to
.IR INT +1
threads when mapping (the extra thread is for I/O, which is frequently idle and
takes little CPU time).
.TP
.B -V
Print version number to stdout
.SH OUTPUT FORMAT
.PP
Minimap outputs mapping positions in the Pairwise mApping Format (PAF). PAF is
a TAB-delimited text format with each line consisting of at least 12 fields as
are described in the following table:
.TS
center box;
cb | cb | cb
r | c | l .
Col Type Description
_
1 string Query sequence name
2 int Query sequence length
3 int Query start coordinate (0-based)
4 int Query end coordinate (0-based)
5 char `+' if query and target on the same strand; `-' if opposite
6 string Target sequence name
7 int Target sequence length
8 int Target start coordinate on the original strand
9 int Target end coordinate on the original strand
10 int Number of matching bases in the mapping
11 int Number bases, including gaps, in the mapping
12 int Mapping quality (0-255 with 255 for missing)
.TE
.PP
When the alignment is available, column 11 gives the total number of sequence
matches, mismatches and gaps in the alignment; column 10 divided by column 11
gives the alignment identity. As minimap does not generate detailed alignment,
these two columns are approximate. PAF may optionally have additional fields in
the SAM-like typed key-value format. Minimap writes the number of minimizer
hits in a cluster to the cm tag.
.SH SEE ALSO
.PP
miniasm(1)