Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

can INDEL_WINDOW_SIZE be increased in bcftools for longer inserts? #1569

Closed
jrharting opened this issue Aug 31, 2021 · 7 comments
Closed

can INDEL_WINDOW_SIZE be increased in bcftools for longer inserts? #1569

jrharting opened this issue Aug 31, 2021 · 7 comments

Comments

@jrharting
Copy link

We're looking at ~700bp inserts using PacBio HiFi data using bcftools and we noticed some large indels (>100 or so bp) missing from the callset. I think this is due to the indel window size here
https://github.com/samtools/bcftools/blob/develop/bam2bcf_indel.c#L40

Is there any reason not to increase this to something like 400-500bp?

I've attached
a toy example i made that models some data I can't share at the moment.

screenshot is generated using:

BAM=synthetic_reads/800_reads.mapped.bam

bcftools mpileup --open-prob 25 \
                 --gap-frac 0.01 \
                 --ext-prob 1 \
                 --min-ireads 3 \
                 --max-depth 1000 \
                 --max-idepth 5000 \
                 -h 500 \
                 -B \
                 -a FORMAT/AD \
                 -f NC_045512.2.fasta \
                 -r NC_045512.2:26438-28428 \
                 2>/dev/null \
                 $BAM


image (10)
image (9)
800_reads.bam.gz
reference.fasta.gz

@pd3 pd3 closed this as completed in 4c71bfb Sep 1, 2021
@pd3
Copy link
Member

pd3 commented Sep 1, 2021

I just pushed a commit that adds a new --indel-size option which allows to override the default value. Some benchmarking should be done before this is made the new default with -X ont and -X pacbio-ccs.

Thanks for reporting the issue and the test case.

@jrharting
Copy link
Author

Thank you very much!

@zeeev
Copy link

zeeev commented Sep 1, 2021

Thanks @pd3, just to double check, the changes are on develop?

@pd3
Copy link
Member

pd3 commented Sep 1, 2021

@zeeev That's correct 4c71bfb

@mrmrwinter
Copy link

Hi,
I'm trying to use bcftools stats to call stats on a vcf, and I'm trying to use the --indel-size flag to capture indels much larger than 60bp.

I'm definitely working from the develop branch but I'm getting errors when trying to use the --indel-size flag.

stats: unrecognized option '--indel-size'

git branch is develop
git rev-parse HEAD gives f2694e52e2af7e60c3129b9ec82f07b5d3e8288b

Any help would be greatly appreciated.

Thanks

@pd3
Copy link
Member

pd3 commented Jul 6, 2022

This thread is about a different command, mpileup not stats. Please open a new issue / feature request so that this is not forgotten.

@mrmrwinter
Copy link

Hi, my fault, i realise now.

Ive ran bcftools mpileup on my sam file with the --indel-size flag, followed by bcftools stats and plot-vcfstats, but i find no indels, whereas i've previously found them in the same alignment using other variant calling methods. I do however need to use this package with the --indel-size flag as I'm interested in indels of all sizes, not just ~60 bp as the standard seems to be for bcftools stats.

My aim is to align two assembly scaffolds with lastz, then generate a vcf and call indels. From there i can determine distribution of indel sizes in the alignment.

My process was:
1, Align two assembly scaffolds with LastZ and output a sam file
2, Converted sam to bam and sorted
3, Ran bcftools mpileup --indel-size 1000 -f asm.fa lastz_alignment.sorted.bam > output.lastz.vcf
4, Ran bcftools stats on the output.lastz.vcf
5, Ran plot-vcfstats on the output of bcftools stats

I did get some errors when running bcftools stats:

[W::vcf_parse] INFO 'D' is not defined in the header, assuming Type=String
[W::vcf_parse] INFO 'Q' is not defined in the header, assuming Type=String
[W::vcf_parse] INFO 'MQ0' is not defined in the header, assuming Type=String
[W::vcf_parse] FILTER '*' is not defined in the header

My question is why is bcftools not detecting or reporting and indels?

Sorry if this is not the right place to ask; I will open my own ticket if needs be.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants