Problem when running Longshot with too few reads/too small coverage using -A #72

Coppini · 2021-08-23T13:24:03Z

I'm having problems with Longshot when running the Artic pipeline (even after updating Longshot to v0.4.3, since Artic still uses v0.4.1), in this command:

longshot \
    -P 0 \
    -F -A \
    --no_haps \
    --bam barcode91.primertrimmed.rg.sorted.bam \
    --ref primer_schemes/SARS-CoV-2/V1200/SARS-CoV-2.reference.fasta \
    --out barcode91.merged.vcf \
    --potential_variants barcode91.merged.vcf.gz

The BAM file I'm using there has only one read, with few bases, resulting in a very small coverage. This was expected, since these were negative samples, where we shouldn't be finding (Sars-Cov-2) reads.
Longshot ends up erroring with the following messages:

2021-08-23 10:03:32 Automatically determining max read coverage.
2021-08-23 10:03:32 Estimating mean read coverage...
2021-08-23 10:03:32 WARNING: Max coverage calculation is highly likely to be incorrect. The number of reference bases covered by the bam file (857) differs significantly from the expected number of positions in the reference (29903). If you are using a bam file that only covers part of the genome, please specify this region exactly with the --region argument so the number of reference bases is known. Alternatively, disable maximum coverage filtering by setting -C to a large number.
2021-08-23 10:03:32 Total reference positions: 29903
2021-08-23 10:03:32 Total bases in bam: 857
2021-08-23 10:03:32 Mean read coverage: 0.03
error: {} ERROR: Max read coverage set to 0. printing empty VCF file

It seems the problem comes from the -A flag:

-A, --auto_max_cov        Automatically calculate mean coverage for region and set max coverage to mean_coverage +
                              5*sqrt(mean_coverage). (SLOWER)

Since Mean read coverage = 0.03, it calculates Max coverage = 0.03+5*sqrt(0.03) = 0.89602540378, and rounds down to 0.

Would it be possible to have -A round the maximum coverage up, instead of down?

The text was updated successfully, but these errors were encountered:

vibansal · 2021-08-27T20:05:35Z

Longshot uses the input reads to estimate sequencing error rates so it will not work well for files with very few reads. Can you not use the -A flag?

Coppini · 2021-08-30T13:41:50Z

Yes, I believe that would be better as well. I'm using the existing Artic pipeline as mentioned, but already opened an issue with them concerning this problem. artic-network/fieldbioinformatics#91

Thank you either way.

This was referenced Aug 23, 2021

Error with longshot when using too few reads/too small coverage artic-network/artic-ncov2019#76

Closed

Error when running with too few reads/very low coverage artic-network/fieldbioinformatics#91

Closed

This was referenced Jun 29, 2022

feat: make flag -A in Longshot optional artic-network/fieldbioinformatics#112

Closed

feat: flag -A in Longshot is now optional mendelics/fieldbioinformatics#1

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problem when running Longshot with too few reads/too small coverage using -A #72

Problem when running Longshot with too few reads/too small coverage using -A #72

Coppini commented Aug 23, 2021 •

edited

Loading

vibansal commented Aug 27, 2021

Coppini commented Aug 30, 2021

Problem when running Longshot with too few reads/too small coverage using -A #72

Problem when running Longshot with too few reads/too small coverage using -A #72

Comments

Coppini commented Aug 23, 2021 • edited Loading

vibansal commented Aug 27, 2021

Coppini commented Aug 30, 2021

Coppini commented Aug 23, 2021 •

edited

Loading