Skip to content

Introduction to SQANTI3

Ángeles Arzalluz-Luque edited this page Apr 27, 2022 · 6 revisions

Table of contents:


Introduction

SQANTI3 is the newest version of the SQANTI tool. SQANTI3 combines features from the original SQANTI and from SQANTI2, as well as newly implemented functionalities and transcript features.

Disclaimer: please note that, although still available, both SQANTI and SQANTI2 are deprecated and will no longer be maintained/updated. All development efforts will continue in SQANTI3, aiming to providing the most comprehensive characterization of long read-defined transcriptomes for the community.

SQANTI3 and the Functional IsoTranscriptomics (FIT) pipeline

SQANTI3 constitutes the first module of the Functional IsoTranscriptomics (FIT) pipeline, which also includes IsoAnnot and tappAS. SQANTI3 is currently designed to perform two different tasks, both of them equally important:

  1. Isoform classification and quality control for long read-defined transcriptomes. The SQ3 categories and subcategories, together with a long list of transcript-level attributes and descriptors, allow users to carefully inspect the properties of their isoform models, as well as identify potential problems generated during library preparation and raw data processing.
  2. Artifact filtering for long read-defined transcriptomes. Using the large number of descriptors calculated by SQANTI3, users can make informed decisions to remove potential false positive isoforms from their transcriptomes. This is particularly relevant considering the biases and pitfalls of current long read sequencing protocols.

To gain insight into these two steps, we encourage reading the original SQANTI publication.

A good curation of the transcriptome is indispensable to proceed with FIT analysis. Downstream steps include:

  • Functional annotation of isoform models, including positionally-defined functional features such as motifs, domains, etc. IsoAnnot, a tool for de novo annotation of isoforms, is currently under development, however, users can run IsoAnnotLite to impute functional features from other already-annotated transcriptomes.
  • Expression-based functional analysis using tappAS. tappAS is a Java GUI application that leverages both expression and domain/motif annotation information to gain insight into the functional implications of alternative isoform expression.

Before running SQANTI3: recommended long read processing workflow

Here is our recommended workflow, including the best way to generate the SQANTI3 inputs and how to proceed after QC and filtering:

  1. Sample pooling: while we are aware that some users may have long read data from several replicates and/or samples, we recommend pooling all long read samples to build a single transcriptome per experiment.
  2. Long read data processing using your preferred transcriptome-building tool. We do not recommend using SQANTI3 on raw long reads, as it is NOT designed as a tool for long read data QC.
  3. Collapse of isoform models. Typically, long read data processing pipelines generate a large number of highly redundant isoform models. We recommend collapsing these using tools such as cDNA_Cupcake or TAMACollapse to reduce the number of isoforms and create unique isoform models prior to running SQANTI3.
  4. Quality control and filtering: we strongly encourage users to do as careful an inspection of their long read-defined transcriptomes as possible, including filtering their transcriptome to remove potential false positive isoforms, which are abundant in long read-generated transcriptomes.
  5. Quantification of the filtered transcriptome using short/long reads and your preferred tool. We do not recommend using the expression estimates input into SQANTI3 for downstream analysis: these are used for quality control purposes only. Once all artifacts are removed from the transcriptome, the reads can be used to obtain a more accurate quantification.

How does SQANTI3 work?

SQANTI3 is a tool for in-depth characterization of isoforms obtained by full-length transcript sequencing, which are commonly returned in a fasta or GTF file format. SQANTI3 combines the long read-defined transcripts with the reference annotation as well as with other orthogonal data to provide a wide range of descriptors of transcript quality. SQ3 generates a comprehensive report to facilitate quality control and filtering of the isoform models.

Sqanti3 workflow