Low alignment rate #472

MarcelloMalpighi · 2024-12-21T11:17:25Z

Hi,
I quantified a paired-end Smart-seq2 RNA-Seq dataset (SRR3936136) using kallisto v0.48.0 with the GENCODE v47 transcriptome. The fragment length was calculated by CollectInsertSizeMetrics (Picard) based on STAR bam. The alignment rate was 34.2% in paired-end mode and 45.6%/43.6% in single-end mode, compared to 74.93% obtained with STAR. I wonder why there is notable discrepancy in the alignment rate. The corresponding information is provided below.
kallisto pair-end run_info.json:

{
        "n_targets": 387944,
        "n_bootstraps": 0,
        "n_processed": 1981171,
        "n_pseudoaligned": 676920,
        "n_unique": 68993,
        "p_pseudoaligned": 34.2,
        "p_unique": 3.5,
        "kallisto_version": "0.48.0",
        "index_version": 10,
        "start_time": "Sat Dec 21 00:02:34 2024",
        "call": "kallisto quant --pseudobam --genomebam --single-overhang -l 302.431195 -s 138.561984 -i /home/Usersdata2/references/gencode/kallistoIndex/gencode.v47.primary_assembly_k15.index -o /home/Usersdata2/abundance/SRR3936136 -t 32 --gtf /home/Usersdata2/references/gencode/gencode.v47.primary_assembly.annotation.gtf --chromosomes /home/Usersdata2/references/gencode/kallistoIndex/gencode.v47.primary_assembly.annotation.chromosomeInfo.txt /home/Usersdata2/datasets/SRP079058/fastq/SRR3936136_1.fastq.gz /home/Usersdata2/datasets/SRP079058/fastq/SRR3936136_2.fastq.gz"
}

kallisto single-end run_info.json:

{
        "n_targets": 387944,
        "n_bootstraps": 0,
        "n_processed": 1981171,
        "n_pseudoaligned": 903816,
        "n_unique": 84772,
        "p_pseudoaligned": 45.6,
        "p_unique": 4.3,
        "kallisto_version": "0.48.0",
        "index_version": 10,
        "start_time": "Sat Dec 21 19:35:34 2024",
        "call": "kallisto quant --pseudobam --genomebam --single-overhang -l 302.431 -s 138.562 -i /home/Usersdata2/references/gencode/kallistoIndex/gencode.v47.primary_assembly_k15.index -o /home/Usersdata2/kallisto/testSingle/read1 -t 16 --single --gtf /home/Usersdata2/references/gencode/gencode.v47.primary_assembly.annotation.gtf --chromosomes /home/Usersdata2/references/gencode/kallistoIndex/gencode.v47.primary_assembly.annotation.chromosomeInfo.txt /home/Usersdata2/datasets/SRP079058/fastq/SRR3936136_1.fastq.gz"
}

{
        "n_targets": 387944,
        "n_bootstraps": 0,
        "n_processed": 1981171,
        "n_pseudoaligned": 862909,
        "n_unique": 81911,
        "p_pseudoaligned": 43.6,
        "p_unique": 4.1,
        "kallisto_version": "0.48.0",
        "index_version": 10,
        "start_time": "Sat Dec 21 19:38:53 2024",
        "call": "kallisto quant --pseudobam --genomebam --single-overhang -l 302.431 -s 138.562 -i /home/Usersdata2/references/gencode/kallistoIndex/gencode.v47.primary_assembly_k15.index -o /home/Usersdata2/kallisto/testSingle/read2 -t 16 --single --gtf /home/Usersdata2/references/gencode/gencode.v47.primary_assembly.annotation.gtf --chromosomes /home/Usersdata2/references/gencode/kallistoIndex/gencode.v47.primary_assembly.annotation.chromosomeInfo.txt /home/Usersdata2/datasets/SRP079058/fastq/SRR3936136_2.fastq.gz"
}

STAR Log.final.out

                                 Started job on |       Dec 21 00:31:43
                             Started mapping on |       Dec 21 00:35:30
                                    Finished on |       Dec 21 00:36:24
       Mapping speed, Million of reads per hour |       132.08

                          Number of input reads |       1981171
                      Average input read length |       128
                                    UNIQUE READS:
                   Uniquely mapped reads number |       665253
                        Uniquely mapped reads % |       33.58%
                          Average mapped length |       127.44
                       Number of splices: Total |       142473
            Number of splices: Annotated (sjdb) |       142473
                       Number of splices: GT/AG |       140613
                       Number of splices: GC/AG |       1251
                       Number of splices: AT/AC |       64
               Number of splices: Non-canonical |       545
                      Mismatch rate per base, % |       0.37%
                         Deletion rate per base |       0.01%
                        Deletion average length |       1.44
                        Insertion rate per base |       0.01%
                       Insertion average length |       1.25
                             MULTI-MAPPING READS:
        Number of reads mapped to multiple loci |       819140
             % of reads mapped to multiple loci |       41.35%
        Number of reads mapped to too many loci |       3
             % of reads mapped to too many loci |       0.00%
                                  UNMAPPED READS:
  Number of reads unmapped: too many mismatches |       552
       % of reads unmapped: too many mismatches |       0.03%
            Number of reads unmapped: too short |       494129
                 % of reads unmapped: too short |       24.94%
                Number of reads unmapped: other |       2094
                     % of reads unmapped: other |       0.11%
                                  CHIMERIC READS:
                       Number of chimeric reads |       0
                            % of chimeric reads |       0.00%

Thanks in advance.

The text was updated successfully, but these errors were encountered:

mschilli87 · 2024-12-21T15:32:22Z

What fraction of your STAR alignments is intergenic or overlapping introns?

MarcelloMalpighi · 2024-12-22T06:08:51Z

I used the following command to count reads aligned to the reference transcriptome by STAR, then subtracted it from the total read number. The result indicates that 52.96% of the reads were not derived from exons. Does this imply that the alignment rate is relatively low due to a significant proportion of reads originating from non-exonic regions?
>samtools view SRR3936136_Aligned.toTranscriptome.out.bam | awk '{print $1}' | sort | uniq | wc -l
>932000

mschilli87 · 2024-12-22T17:02:21Z

I would assume so. Kallisto usually (pseudo)aligns to annotated spliced transcripts only. So your comparison to STAR is not really apples to apples.

MarcelloMalpighi · 2024-12-24T12:29:42Z

The original literature indicates that it is a standard single-cell RNA-seq library.

This comment has been minimized.

Sign in to view

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Low alignment rate #472

Low alignment rate #472

MarcelloMalpighi commented Dec 21, 2024

mschilli87 commented Dec 21, 2024

MarcelloMalpighi commented Dec 22, 2024

mschilli87 commented Dec 22, 2024

This comment has been minimized.

MarcelloMalpighi commented Dec 24, 2024

Low alignment rate #472

Low alignment rate #472

Comments

MarcelloMalpighi commented Dec 21, 2024

mschilli87 commented Dec 21, 2024

MarcelloMalpighi commented Dec 22, 2024

mschilli87 commented Dec 22, 2024

This comment has been minimized.

MarcelloMalpighi commented Dec 24, 2024