Skip to content

Running SQANTI3 rescue

Ángeles Arzalluz-Luque edited this page Jun 17, 2022 · 33 revisions

⚠️UNDER CONSTRUCTION⚠️

Table of contents:


Introduction

As of version 5.1, a third module has been added to the SQANTI3 workflow for transcriptome characterization and quality control. The SQANTI3 rescue algorithm is designed to be run after transcriptome filtering and intends to use the long read transcript evidence provided by discarded isoforms (i.e. artifacts) to avoid losing transcripts/genes that are detected as expressed, but whose start/end/junctions could not be confidently validated using orthogonal data. In particular, during the rescue, SQANTI3 will try to confidently assign each discarded artifact to the best matching reference transcript. As a result, SQANTI3 rescue will generate an expanded transcriptome GTF including a set of reference transcripts as well as the long read-defined isoforms that passed the filter.

Similarly to the SQANTI3 filter, the SQANTI3 rescue is designed as a dual implementation, depending on whether the rules or the machine learning filter was previously run. Therefore, the sqanti3_rescue.py script requires a flag to be provided to activate either the ml or rules specific rescue.

usage: sqanti3_rescue.py [-h] {ml,rules} ...

Rescue artifacts discarded by the SQANTI3 filter, i.e. find closest match for
the artifacts in the reference transcriptome and add them to the
transcriptome.

positional arguments:
  {ml,rules}

optional arguments:
  -h, --help  show this help message and exit

Motivation

To be completed: further justify why transcript rescue is required after filtering.

Rescue strategy in SQANTI3

TBC: step-by-step descrption of the rescue strategy implemented in SQANTI3. Decision tree for rescue.

Rules filter rescue

These are the arguments accepted by sqanti3_rescue.py rules:

usage: sqanti3_rescue.py rules [-h] [--isoforms ISOFORMS] [--gtf GTF] [-g REFGTF] 
                               [-f REFGENOME] [-k REFCLASSIF]
                               [-e {all,fsm,none}] [-o OUTPUT] [-d DIR] 
                               [--skip_report] [-v] [-j JSON]
                               sqanti_filter_classif

Rescue for rules-filtered transcriptomes.

positional arguments:
  sqanti_filter_classif
                        SQANTI filter (ML or rules) output classification file.

optional arguments:
  -h, --help            show this help message and exit
  --isoforms ISOFORMS   FASTA file output by SQANTI3 QC (*_corrected.fasta), i.e. the full long read transcriptome.
  --gtf GTF             GTF file output by SQANTI3 filter (*.filtered.gtf).
  -g REFGTF, --refGTF REFGTF
                        Full path to reference transcriptome GTF used when running SQANTI3 QC.
  -f REFGENOME, --refGenome REFGENOME
                        Full path to reference genome FASTA used when running SQANTI3 QC.
  -k REFCLASSIF, --refClassif REFCLASSIF
                        Full path to the classification file obtained when running SQANTI3 QC on the reference
                        transcriptome.
  -e {all,fsm,none}, --rescue_mono_exonic {all,fsm,none}
                        Whether or not to include mono-exonic artifacts in the rescue. Options include: none, fsm and all
                        (default).
  -o OUTPUT, --output OUTPUT
                        Prefix for output files.
  -d DIR, --dir DIR     Directory for output files. Default: Directory where the script was run.
  --skip_report         Skip creation of a report about the filtering
  -v, --version         Display program version number.
  -j JSON, --json JSON  Full path to the JSON file including the rules used when running the SQANTI3 rules filter.

Machine learning filter rescue

These are the arguments accepted by sqanti3_rescue.py rules:

usage: sqanti3_rescue.py ml [-h] [--isoforms ISOFORMS] [--gtf GTF] [-g REFGTF] 
                            [-f REFGENOME] [-k REFCLASSIF]
                            [-e {all,fsm,none}] [-o OUTPUT] [-d DIR] 
                            [--skip_report] [-v] [-r RANDOMFOREST] [-j THRESHOLD]
                            sqanti_filter_classif

Rescue for ML-filtered transcriptomes.

positional arguments:
  sqanti_filter_classif
                        SQANTI filter (ML or rules) output classification file.

optional arguments:
  -h, --help            show this help message and exit
  --isoforms ISOFORMS   FASTA file output by SQANTI3 QC (*_corrected.fasta), i.e. the full long read transcriptome.
  --gtf GTF             GTF file output by SQANTI3 filter (*.filtered.gtf).
  -g REFGTF, --refGTF REFGTF
                        Full path to reference transcriptome GTF used when running SQANTI3 QC.
  -f REFGENOME, --refGenome REFGENOME
                        Full path to reference genome FASTA used when running SQANTI3 QC.
  -k REFCLASSIF, --refClassif REFCLASSIF
                        Full path to the classification file obtained when running SQANTI3 QC on the reference
                        transcriptome.
  -e {all,fsm,none}, --rescue_mono_exonic {all,fsm,none}
                        Whether or not to include mono-exonic artifacts in the rescue. Options include: none, fsm and all
                        (default).
  -o OUTPUT, --output OUTPUT
                        Prefix for output files.
  -d DIR, --dir DIR     Directory for output files. Default: Directory where the script was run.
  --skip_report         Skip creation of a report about the filtering
  -v, --version         Display program version number.
  -r RANDOMFOREST, --randomforest RANDOMFOREST
                        Full path to the randomforest.RData object obtained when running the SQANTI3 ML filter.
  -j THRESHOLD, --threshold THRESHOLD
                        Default: 0.7. Machine learning probability threshold to filter elegible rescue targets (mapping
                        hits).

Rescue output files