forked from wtsi-npg/illumina2bam
-
Notifications
You must be signed in to change notification settings - Fork 0
/
README
195 lines (124 loc) · 10.3 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
Here is a collection of tools to generate or process BAM/SAM files using Picard Java API.
It currently includes Illumina2bam, BamIndexDecoder, BamReadTrimmer, BamMerger, BamTagStripper, BamQualityQuantisation, ChangeBamHeader, SplitBamByReadGroup and AlignmentFilter etc.
It is a NetBeans Java project. You should be able to open it from NetBeans directly.
To test: ant test.
To generate jar files: ant jar.
You can find more information from http://gq1.github.com/illumina2bam/.
---------------------------------------------------
Illumina2bam
This tool converts Illumina BCL files directly to bam files using Picard JAVA API.
It will create a bam file for an entire lane.
It puts an indexing read quality and sequence in tags (we're suggesting the use of) "QT" and "BC" e.g. QT:Z:HHHHHHHH BC:Z:ATCACGTT, on the first read record.
User can overwrite them from command line.
You can pipe the output bam to BamIndexDecoder to deplex.
It puts what preceding Illumina code has run e.g. RTA, in PG headers.
By default it only puts in PF filtered reads. There's an option to put all reads in the BAM.
Default bcl2qseq will also exclude sequence identified as TruSeq controls from passing the qseq filter field. Currently we don't do that yet. These reads should be included in the output.
We expect read qualities to differ from those produced via default bcl2qseq as we'll give Ns a quality of 0 (rather than 2 which we're not convinced about), and we'll not do the EAMSS(?) flattening of the 3' tail of read to qvals of 2 a.k.a. "The Killer Bs".
There's also an option to record the second basecall from SCL files.
RunInfo or runParamaters under runfolder or Config XML files under Intensities and BaseCalls directory are used to get tile list, cycle ranges per read and other meta data.
CLOCS and filter files are also needed. If Clocs file missing, it will try to use locs or pos file instead.
All reads are supposed in one read group, by default, which id is 1. Sample and library name can be passed from command line, otherwise use unknown.
EXAMPLE TO RUN:
java -jar "Illumina2bam.jar" INTENSITY_DIR=testdata/110323_HS13_06000_B_B039WABXX/Data/Intensities LANE=1 OUTPUT=testdata/6000_1.bam VALIDATION_STRINGENCY=STRICT CREATE_INDEX=false CREATE_MD5_FILE=true FIRST_TILE=1101 TILE_LIMIT=1 COMPRESSION_LEVEL=1
HELP:
java -jar "Illumina2bam.jar" -h
USAGE: Illumina2bam [options]
Convert Illumina BCL to BAM or SAM file. Version: 0.03
Options:
--help
-h Displays options specific to this tool.
--stdhelp
-H Displays options specific to this tool AND options common to all Picard command line
tools.
--version Displays program version.
RUN_FOLDER=File
R=File Illumina runfolder directory including runParameters xml file under it, upwards two
levels from Intensities directory if not given. Default value: null.
INTENSITY_DIR=File
I=File Illumina intensities directory including config xml file, and clocs, locs or pos files
under lane directory. Required.
BASECALLS_DIR=File
B=File Illumina basecalls directory including config xml file, and filter files, bcl, maybe scl
files under lane cycle directory, using BaseCalls directory under intensities if not
given. Default value: null.
LANE=Integer
L=Integer Lane number. Required.
OUTPUT=File
O=File Output file name. Required.
GENERATE_SECONDARY_BASE_CALLS=Boolean
E2=Boolean Including second base call or not, default false. Default value: false. This option can
be set to 'null' to clear the default value. Possible values: {true, false}
PF_FILTER=Boolean
PF=Boolean Filter cluster or not, default true. Default value: true. This option can be set to
'null' to clear the default value. Possible values: {true, false}
READ_GROUP_ID=String
RG=String ID used to link RG header record with RG tag in SAM record, default 1. Default value: 1.
This option can be set to 'null' to clear the default value.
SAMPLE_ALIAS=String
SM=String The name of the sequenced sample, using library name if not given. Default value: null.
LIBRARY_NAME=String
LB=String The name of the sequenced library, default unknown. Default value: unknown. This option
can be set to 'null' to clear the default value.
STUDY_NAME=String
ST=String The name of the study. Default value: null.
PLATFORM_UNIT=String
PU=String The platform unit, using runfolder name plus lane number if not given. Default value:
null.
RUN_START_DATE=Iso8601Date The start date of the run, read from config file if not given. Default value: null.
SEQUENCING_CENTER=String
SC=String Sequence center name, default SC for Sanger Center. Default value: SC. This option can
be set to 'null' to clear the default value.
PLATFORM=String The name of the sequencing technology that produced the read, default ILLUMINA. Default
value: ILLUMINA. This option can be set to 'null' to clear the default value.
FIRST_TILE=Integer If set, this is the first tile to be processed (for debugging). Note that tiles are not
processed in numerical order. Default value: null.
TILE_LIMIT=Integer If set, process no more than this many tiles (for debugging). Default value: null.
BARCODE_SEQUENCE_TAG_NAME=String
BC_SEQ=String Tag name for barcode sequence. Default value: BC. This option can be set to 'null' to
clear the default value.
BARCODE_QUALITY_TAG_NAME=String
BC_QUAL=String Tag name for barcode quality. Default value: QT. This option can be set to 'null' to
clear the default value.
SECOND_BARCODE_SEQUENCE_TAG_NAME=String
SEC_BC_SEQ=String Tag name for second barcode sequence. Default value: null.
SECOND_BARCODE_QUALITY_TAG_NAME=String
SEC_BC_QUAL=String Tag name for second barcode quality. Default value: null.
-----------------------------------------------
BamReadTrimmer
Strip part of a read (fixed position) - typically a prefix of the forward read, and optionally place this and its quality in BAM tags.
EXAMPLE TO RUN:
java -jar "BamReadTrimmer.jar" INPUT=testdata/bam/6210_8.sam OUTPUT=testdata/6210_8_trimmed.bam FIRST_POSITION_TO_TRIM=1 TRIM_LENGTH=3 CREATE_MD5_FILE=true ONLY_FORWARD_READ=true SAVE_TRIM=true TRIM_BASE_TAG=rs TRIM_QUALITY_TAG=qs VALIDATION_STRINGENCY=SILENT
-----------------------------------------------
BamMerger
Merge BAM/SAM alignment info in a bam with the data in an unmapped BAM file, producing a third BAM file that has alignment data and all the additional data from the unmapped BAM. The SQ records and alignment PG records in the aligned bam file will be added to the header of unmampped bam file to form the new header of output file.
EXAMPLE TO RUN:
java -jar "BamMerger.jar" ALIGNED=testdata/bam/6210_8_aligned.sam I=testdata/bam/6210_8.sam OUTPUT=testdata/6210_8_merged.bam VALIDATION_STRINGENCY=SILENT
-----------------------------------------------
STANDARD PICARD OPTIONS:
TMP_DIR=File Default value: /tmp/username. This option can be set to 'null' to clear the default value.
VERBOSITY=LogLevel Control verbosity of logging. Default value: INFO. This option can be set to 'null' to
clear the default value. Possible values: {ERROR, WARNING, INFO, DEBUG}
QUIET=Boolean Whether to suppress job-summary info on System.err. Default value: false. This option
can be set to 'null' to clear the default value. Possible values: {true, false}
VALIDATION_STRINGENCY=ValidationStringency
Validation stringency for all SAM files read by this program. Setting stringency to
SILENT can improve performance when processing a BAM file in which variable-length data
(read, qualities, tags) do not otherwise need to be decoded. Default value: STRICT. This
option can be set to 'null' to clear the default value. Possible values: {STRICT,
LENIENT, SILENT}
COMPRESSION_LEVEL=Integer Compression level for all compressed files created (e.g. BAM and GELI). Default value:
5. This option can be set to 'null' to clear the default value.
MAX_RECORDS_IN_RAM=Integer When writing SAM files that need to be sorted, this will specify the number of records
stored in RAM before spilling to disk. Increasing this number reduces the number of file
handles needed to sort a SAM file, and increases the amount of RAM needed. Default
value: 500000. This option can be set to 'null' to clear the default value.
CREATE_INDEX=Boolean Whether to create a BAM index when writing a coordinate-sorted BAM file. Default value:
false. This option can be set to 'null' to clear the default value. Possible values:
{true, false}
CREATE_MD5_FILE=Boolean Whether to create an MD5 digest for any BAM files created. Default value: false. This
option can be set to 'null' to clear the default value. Possible values: {true, false}
OPTIONS_FILE=File File of OPTION_NAME=value pairs. No positional parameters allowed. Unlike command-line
options, unrecognized options are ignored. A single-valued option set in an options file
may be overridden by a subsequent command-line option. A line starting with '#' is
considered a comment. This option may be specified 0 or more times.