-
Notifications
You must be signed in to change notification settings - Fork 17
Configuration file
OPERA-MS can be run using a configuration file that indicates the path to the input files and the options used for the assembly. The commands to process the test dataset are:
cd test_files
perl ../OPERA-MS.pl test.config 2> log.err
The configuration file is formatted as follows:
#One space between OPTION and VALUE
<OPTION1> <VALUE1>
<OPTION2> <VALUE2>
...
<OPTION2> <VALUE3>
-
ILLUMINA_READ_1 : path to the first read for Illumina paired-end read data
-
ILLUMINA_READ_2 : path to the second read for Illumina paired-end read data
-
LONG_READ : path to the long-read fastq file obtained from either Oxford Nanopore, PacBio or Illumina Synthetic Long Read sequencing
-
OUTPUT_DIR : directory where OPERA-MS results will be outputted
-
REF_CLUSTERING :
default: YES
- whether reference-based clustering should be performed (YES) or skipped (NO) -
STRAIN_CLUSTERING :
default: YES
- whether strain-level clustering should be performed (YES) or skipped (NO) -
POLISHING :
default: NO
- whether short-read polishing (currently using Pilon) should be performed (YES) or skipped (NO). The polished contigs can be found in contigs.polished.fasta -
LONG_READ_MAPPER :
default: blasr
- software used for long-read mapping i.e. blasr or minimap2 -
KMER_SIZE :
default: 60
- kmer value used to assemble contigs -
CONTIG_LEN_THR :
default: 500
- contig length threshold for clustering; contigs smaller than CONTIG_LEN_THR will be filtered out -
CONTIG_EDGE_LEN :
default: 80
- during contig coverage calculation, number of bases filtered out from each contig end, to avoid biases due to lower mapping efficiency -
CONTIG_WINDOW_LEN :
default: 340
- window length in which the coverage estimation is performed. We recommend using CONTIG_LEN_THR - 2 * CONTIG_EDGE_LEN as the value -
CONTIGS_FILE : path to the contigs file, if the short-reads have been assembled previously
-
NUM_PROCESSOR :
default : 2
- number of processors to use (note that 2 is the minimum)