Longshot on Iso-seq data #106

camelest · 2024-08-28T06:54:27Z

Hi, thank you so much for the great tool.

I'm trying to apply longshot on long-read RNA-seq data (Iso-seq data from PacBio HiFi reads).
I ran the pipeline with default options and got ~30% reads assigned with HP tags (Is is true to assume HP:i:1 represents reads with REF SNPs and HP:i:2 with ALT SNPs?)
Do you have any recommendations to recover more reads?

I was wondering whether I should loosen

-I, --max_cigar_indel <int>                Throw away a read-variant during allelotyping if there is a CIGAR indel
                                               (I/D/N) longer than this amount in its window. [default: 20]

since the RNA-seq reads CIGAR skips large introns.

Thank you so much for your kind help.

The text was updated successfully, but these errors were encountered:

vibansal · 2024-09-05T16:03:46Z

One way to avoid excluding these reads would be to modify the bam/cram file to split the reads containing long 'N' cigar operations into multiple reads. See for example:

https://gatk.broadinstitute.org/hc/en-us/articles/360036858811-SplitNCigarReads

camelest · 2024-10-01T00:39:43Z

@vibansal Thank you so much for your reply. Sorry for my late reply, I somehow did not receive the notice.

I tried with SplitNCigarReads. For some regions, even though the reads look pretty nice as a SNP, it got filtered out at the final step. (included in 4.0.final_genotypes.vcf but not in the final result) Does it have something to do with other settings?

The output in 4.0.final_genotypes.vcf is

chrX    103482802       .       T       G       1       dp      DP=9850;AM=0;MC=0;MF=0.000;MB=0.000;AQ=25.62;GM=0;PH=0.00,37.78,37.78,37.78;SC=None;    GT:GQ:DP:AD:PS:UG:UQ    0/0:34:9850:5355,4495:.:0/1:127.63

vibansal · 2024-10-18T16:40:19Z

Sorry for the late reply. The variant is filtered since it has very high depth (9850). The "-max_cov" option can be used to increase the threshold.

camelest · 2024-10-20T11:04:16Z

Thank you so much for the reply. I tried with --max_cov 100000 and now the dp changed to PASS but I still don't find it in the final output.
Does the final quality filtering occur at this step?

(in 4.0.final_genotypes.vcf)

chrX    103482802       .       T       G       1       PASS    DP=9850;AM=0;MC=0;MF=0.000;MB=0.000;AQ=25.62;GM=0;PH=0.00,37.78,37.78,37.78;SC=None;    GT:GQ:DP:AD:PS:UG:UQ    0/0:34:9850:5355,4495:.:0/1:127.63

vibansal · 2024-11-07T23:33:10Z

The genotype of the variant is 0/0 and the final output contains only variants with non-ref genotypes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Longshot on Iso-seq data #106

Longshot on Iso-seq data #106

camelest commented Aug 28, 2024

vibansal commented Sep 5, 2024

camelest commented Oct 1, 2024

vibansal commented Oct 18, 2024

camelest commented Oct 20, 2024 •

edited

Loading

vibansal commented Nov 7, 2024

Longshot on Iso-seq data #106

Longshot on Iso-seq data #106

Comments

camelest commented Aug 28, 2024

vibansal commented Sep 5, 2024

camelest commented Oct 1, 2024

vibansal commented Oct 18, 2024

camelest commented Oct 20, 2024 • edited Loading

vibansal commented Nov 7, 2024

camelest commented Oct 20, 2024 •

edited

Loading