You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There are some errors in the decomposed VCFs, which is likely caused by vcfwave:
Some of the decomposed variants have incorrect genotypes and INFO
The decomposed INFO/AT do not match the alleles. It looks like vcfwave treat this field as Number=A instead of Number=R
To quickly replicate this issue:
bcftools view -H -r chr22:11021446-11021447 -s HG01109 hprc-v1.1-mc-chm13.raw.vcf.gz
chr22 11021446 >55070592>55070595 TT T,TTT 60 . AC=1,0;AF=0.0857143,0.0571429;AN=2;AT=>55070592>55070593>55070595,>55070592>55070595,>55070592>55070593>55070594>55070595;NS=44;LV=0 GT 0|1
bcftools view -H -r chr22:11021446-11021447 -s HG01109 hprc-v1.1-mc-chm13.vcfbub.a100k.wave.vcf.gz
[W::bcf_calc_ac] Incorrect number of AC fields at chr22:11021447. (This message is printed only once.)
chr22 11021446 >55070592>55070595_1 TT T 60 . AC=1;AF=0.085714;AN=2;AT=>55070592>55070593>55070595;NS=44;LV=0;ORIGIN=chr22:11021446;LEN=1;TYPE=del GT 0|1
chr22 11021447 >55070592>55070595_2 T TT 60 . AC=1;AF=0.0857143,0.0571429;AN=2;AT=>55070592>55070593>55070595,>55070592>55070595,>55070592>55070593>55070594>55070595;NS=44;LV=0;ORIGIN=chr22:11021446;LEN=1;TYPE=ins GT 0|1
Genotype error:
The >55070592>55070595 record has 2 alternative alleles, the first one is deletion (TT -> T) and the second one is insertion (TT -> TTT). In raw VCF, HG01109 has one copy of the deletion on haplotype2. While in the decomposed VCF, HG01109 has both the deletion and insertion on haplotype2.
INFO error
The AT in decomposed VCF don't match the input VCF. Moreover, it seems all INFO for >55070592>55070595_2 are still the same as the raw VCF.
I also report this issue to vcfwave, you may refer to vcflib/vcflib#403 for details .
Besides, this issue was initially identified when I tried to run this Pangenie pipeline.
The error records are reported in graph.vcf.log like below. In chr22, there are 3111 errors.
Two overlapping variants at same haplotype at chr22:11021446, set allele to missing.
Two overlapping variants at same haplotype at chr22:11023348, set allele to missing.
Two overlapping variants at same haplotype at chr22:11028129, set allele to missing.
...
The text was updated successfully, but these errors were encountered:
Han-Cao
changed the title
Incorrect INFO and genotypes in the decomposed VCFs
Incorrect INFO and genotypes in HPRCv1.1 decomposed VCFs
Jul 5, 2024
Hi @glennhickey ,
There are some errors in the decomposed VCFs, which is likely caused by vcfwave:
INFO
INFO/AT
do not match the alleles. It looks like vcfwave treat this field asNumber=A
instead ofNumber=R
To quickly replicate this issue:
Genotype error:
The
>55070592>55070595
record has 2 alternative alleles, the first one is deletion (TT -> T) and the second one is insertion (TT -> TTT). In raw VCF, HG01109 has one copy of the deletion on haplotype2. While in the decomposed VCF, HG01109 has both the deletion and insertion on haplotype2.INFO error
The
AT
in decomposed VCF don't match the input VCF. Moreover, it seems all INFO for>55070592>55070595_2
are still the same as the raw VCF.I also report this issue to vcfwave, you may refer to vcflib/vcflib#403 for details .
Besides, this issue was initially identified when I tried to run this Pangenie pipeline.
The error records are reported in
graph.vcf.log
like below. In chr22, there are 3111 errors.The text was updated successfully, but these errors were encountered: