Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bcftools norm with --multi-overlaps . outputs different variants depending on allele order in the input variant #2160

Closed
astaric opened this issue Apr 15, 2024 · 1 comment

Comments

@astaric
Copy link

astaric commented Apr 15, 2024

Using

bcftools norm -m - --multi-overlaps .  test.vcf

on the following input

##fileformat=VCFv4.2
##reference=ref.fasta
##contig=<ID=1,length=51304566>
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	A	B
1	511	.	C	G,T	.	.	.	GT	1|0	0|1
1	511	.	C	T,G	.	.	.	GT	2|0	0|2

produces the following variants:

1	511	.	C	G	.	.	.	GT	1|0	0|1
1	511	.	C	T	.	.	.	GT	.|.	.|.
1	511	.	C	T	.	.	.	GT	.|0	0|.
1	511	.	C	G	.	.	.	GT	1|.	.|1

I would expect the output variants for both input variants to look the same, as the only difference in the input variants is the order of the alt alleles. bcftools norm --atomize --atom-overlaps outputs the same variants regardless of the order of alleles in the input variant:

1	511	.	C	G	.	.	.	GT	1|0	0|1
1	511	.	C	T	.	.	.	GT	.|0	0|.
1	511	.	C	G	.	.	.	GT	1|0	0|1
1	511	.	C	T	.	.	.	GT	.|0	0|.

I tracked the difference down to this line of code:
https://github.com/samtools/bcftools/blob/develop/vcfnorm.c#L875
which only keeps refs as refs if this is the first (split) variant in the output, otherwise it sets it to unknown.
Is there a specific reason for treating refs in the first output variant differently?

@pd3 pd3 closed this as completed in 5977f1f Apr 27, 2024
@pd3
Copy link
Member

pd3 commented Apr 27, 2024

Mmm, it was a mistake. Thank you for reporting the bug and the test case, it is fixed now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants