-
Notifications
You must be signed in to change notification settings - Fork 51
Sqanti3 wrapper script
Sqanti3 wrapper is a tool written in bash to run a QC, filter, rescue pipeline of SQANTI3 from an automated script with a reusable config file
To execute Sqanti3_wrapper, the conf file must be provided with all the paths and parameters needed to execute either QC, filter, rescue or a combination of them. Most of the parameters have a default argument: In most cases, because they point to a file that is, or will be, automatically generated by a previous step; and by default, it will execute the example provided in the repository.
The arguments are classified into:
- General arguments for two or more steps of sqanti
- Flow control, to skip steps at will
- Sqanti QC parameters
- Sqanti filter parameters
- Sqanti rescue parameters
- Custom values for particular executions. They are usually not needed to change, but present for particular executions (such as using docker or singularity).
- General arguments
Here you can specify the references used in the analysis, the number of cores to use, and the main parameters for the rules and ml methods of filter and rescue. By default, these values are common to the different steps in which they are available
reference_gtf="example/gencode.v38.basic_chr22.gtf"
reference_fasta="example/GRCh38.p13_chr22.fasta"
cpus="4"
json_for_rules="utilities/filter/filter_default.json" # json file with rules for filter and rescue rules
threshold="0.7"
- Flow control
QC, filter and rescue can be skiped if they have already been computed, or if not interested in filter and/or rescue, by changing the corresponding varible to true. If not skiped, filter and rescue can be used in mode rules, ml or both
skip_qc="false" #true to skip the QC
skip_filter="true" # true to skip the filter
filter_mode="both" # rules, ml or both
skip_rescue="false" # true to skip the rescue
rescue_mode="both" # rules, ml or both
- Sqanti QC:
Sqanti3 has many parameters, but only 3 of them are mandatory: input GTF, reference GTF and reference fasta. The references are defined in the general parameters section, so only the former on is present in this section. All other parameters are optional, but by default they are targeted to the example available, because they are the most common ones. Others have a default option in the script, or different posibilities to be used. For a detailed infography of the different parameters, how they affect the pipeline, and how to use them effectively, please check the SQANTI3 QC running tutorial or print the help with sqanti3_qc.py -h
QC_input="example/UHR_chr22.gtf" # Input data
QC_output_prefix="UHR_chr22"
QC_destination_folder="/tmp/sqanti3_wrapper/QC/"
QC_min_ref_length="" # minimum reference transcript length. Default 0 bp
QC_force_id_ignore=""
QC_cage_peak_bed_file="data/ref_TSS_annotation/human.refTSS_v3.1.hg38.bed"
QC_aligner_choice="" # minimap2, deSALT, gmap or uLTRA
QC_polyA_motif_list="data/polyA_motifs/mouse_and_human.polyA_motif.txt"
QC_polyA_peak=""
QC_phylobed=""
QC_skipORF="true"
QC_is_fusion="false"
QC_orf_input=""
QC_is_fastq="false" # Requiered to be true if QC_input is a fastq file.
QC_expression_matrix=""
QC_gmap_index=""
QC_chunks=1 # (int) chunks > 1 activates parallelization
QC_coverage="" # Junctions coverage file
QC_sites=""
QC_window=""
QC_genename="" # Column name from GTF to define gene names
QC_full_length_pacbio_abundance_tsv="example/UHR_abundance.tsv"
QC_saturation="true"
QC_report_file="both" # pdf, html or both
QC_isoAnnotLite=""
QC_gff3=""
QC_short_reads_fofn="example/UHR_chr22_short_reads.fofn"
QC_SR_bam=""
QC_isoform_hits=""
QC_ratio_TSS_metric="" # Which metric should be reported in the ratio_TSS column
SQANTI3 filter parameters are classified in 3 categories: common arguments, rules specific arguments, and ml specific arguments. For the general parameters, the only mandatory argument is the input classification file. The default paths allow to execute the whole pipeline using the generated files in the QC step, but you can change it at will if you consider so. For a detailed infogram on how these parameters affect the pipeline, please check the SQANTI3 filter running tutorial
# Common elements for filter rules and ml
filter_input_classification="${QC_destination_folder}/${QC_output_prefix}_classification.txt"
filter_skip_report="false" # true to skip the report
filter_corrected_gtf="${QC_destination_folder}/${QC_output_prefix}_corrected.gtf"
filter_isoforms="GMST/GMST_tmp.faa"
filter_isoannotgff3="example/SQANTI3_QC_output/UHR_chr22.gff3"
filter_sam=""
filter_faa="${QC_destination_folder}/${QC_output_prefix}_corrected.faa"
filter_monoexonic="true"
filter_skip_report=""
SQANTI3 filter rules only need 2 extra optional arguments: Output folder and prefix for the generated files. If not present, the default ones may be used and may overwrite some files. They are strongly recommended
filter_rules_ouput_folder="/tmp/sqanti3_wrapper/sqanti3_filter_rules"
filter_rules_prefix="${QC_output_prefix}"
Sqanti3 filter ml provides a few more parameters for tunning its execution, beyond the saving folder and preffix:
filter_ml_ouput_folder="/tmp/sqanti3_wrapper/sqanti3_filter_ml"
filter_ml_prefix="${QC_output_prefix}"
filter_ml_percent_training="0.8"
filter_ml_TP="" # Path to the file that contains TP. If empty, will be calculated
filter_ml_TN="" # Path to the file that contains TN. If empty, will be calculated
filter_ml_threshold=${threshold}
filter_ml_remove_columns=""
filter_ml_intermediate_files=""
filter_ml_max_class_size=""
filter_ml_intrapriming=""
Rescue parameters are fully divided into rules and ml, as each used the output of their corresponding filter method.
Rescue rules parameters include some parameters already used by QC and filter. For a complete infography of Rescue's parameters, please check or ask the script for help with
rescue_rules_output_prefix="${QC_output_prefix}"
rescue_rules_output_folder="/tmp/sqanti3_wrapper/sqanti3_rescue_rules"
rescue_rules_filtered_classification="${filter_rules_ouput_folder}/${filter_rules_prefix}_RulesFilter_result_classification.txt" # Output classification from filter step
rescue_rules_reference_classification="${QC_destination_folder}/${QC_output_prefix}_classification.txt" # The output classification from QC step
rescue_rules_isoforms="${QC_destination_folder}/${QC_output_prefix}_corrected.fasta" # Corrected isoforms from QC step
rescue_rules_gtf="${filter_rules_ouput_folder}/${filter_rules_prefix}.filtered.gtf" # Filtered GTF from filter step
rescue_rules_monoexons="all" # Keep monoexons
rescue_rules_mode="full"
Conviniently, rescue ml also needs files generated by QC and Filter. The only extra parameter needed (optional) is the randomforest.Rdata generated by filter ml
rescue_ml_output_prefix="${QC_output_prefix}"
rescue_ml_output_folder="/tmp/sqanti3_wrapper/sqanti3_rescue_ml"
rescue_ml_filtered_classification="${filter_ml_ouput_folder}/${filter_ml_prefix}_MLresult_classification.txt"
rescue_ml_reference_classification="${QC_destination_folder}/${QC_output_prefix}_classification.txt"
rescue_ml_isoforms="${QC_destination_folder}/${QC_output_prefix}_corrected.fasta"
rescue_ml_gtf="${filter_ml_ouput_folder}/${filter_ml_prefix}.filtered.gtf"
rescue_ml_monoexons="all"
rescue_ml_mode="full"
rescue_ml_randomforest_rdata="${filter_ml_ouput_folder}/randomforest.RData"
These values have been adjusted to allow execution of the whole pipeline and sharing references and common elements between the different steps, and specify the sqanti3 version you want to use.
Select the sqanti you prefer. If you wish it, you can use the docker image (or equivalent singularity image) by changing the python3 statement to a docker run command.
sqanti3_qc="python3 sqanti3_qc.py"
sqanti3_filter="python3 sqanti3_filter.py"
sqanti3_rescue="python3 sqanti3_rescue.py"
``
In case you need or want to execute QC, filter or rescue with different references or files, which is not recommended to keep consistency of the execution, you can change these values. Creating a new config is recommended for this cases instead of customizing these values.
QC_reference_gtf=${reference_gtf} QC_reference_fasta=${reference_fasta} QC_cpus=${cpus}
filter_rules_json_file=${json_for_rules}
rescue_rules_reference_genome=${reference_fasta} rescue_rules_reference_gtf=${reference_gtf} rescue_rules_json_file="${json_for_rules}"
rescue_ml_reference_genome=${reference_fasta} rescue_ml_reference_gtf=${reference_gtf} rescue_ml_threshold=${threshold}
Wiki index
- Introduction to SQANTI3
- Dependencies and installation
- Version history
- Isoform classification: categories and subcategories
- Running SQANTI3 quality control
- Understanding the output of SQANTI3 QC
- IsoAnnotLite
- Running SQANTI3 filter
- Running SQANTI3 rescue
- Tutorial: running SQANTI3 on an example dataset
- Running SQANTI-reads
- Memory requirements to use parallelization