You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
At this position:
HaplotypeCaller outputs this to the VCF: 9 418269 . TTTTG T 366.73 . AC=1;AF=0.500;AN=2;BaseQRankSum=0.359;ClippingRankSum=0.000;DP=36;ExcessHet=3.0103;FS=0.000;MLEAC=1;MLEAF=0.500;MQ=60.90;MQRankSum=2.404;QD=11.11;ReadPosRankSum=0.041;SOR=0.914 GT:AD:DP:GQ:PL 0/1:21,12:33:99:404,0,760
9 418272 . T C 509.77 . AC=1;AF=0.500;AN=2;BaseQRankSum=-3.075;ClippingRankSum=0.000;DP=36;ExcessHet=3.0103;FS=0.000;MLEAC=1;MLEAF=0.500;MQ=60.90;MQRankSum=-2.152;QD=14.56;ReadPosRankSum=-0.236;SOR=0.784 GT:AD:DP:GQ:PL 0/1:14,21:35:99:538,0,505
The SNP/spanning deletion site is at position 418272. Notice HaplotypeCaller assigns a genotype of 0/1 (T/C). The genotype should really be 1/2 (C/*) with the * allele as the other alternate allele.
Functional equivalence of GATK4 ports. That's my party line for now. Didn't realize you had actually started on this... but yeah, 3.7 is going out today.
@davidbenjamin Any thoughts on this? Laura thought she implemented a fix but it have any effect. I am becoming familiar with the HC code but it's slow going. While stepping though the code, I noticed it correctly identifies the events for the deletion and SNP.
@ronlevine I would put in a few breakpoints to see where the spanning deletion allele gets lost from the SNP site VariantContext. I just ran things with the new qual out of curiosity and the bug persists, so it's probably not an issue of previous bugs with multiallelic sites.
@davidbenjamin That's what I have been doing. I could not find where the spanning deletion alleles was ever added. I agree with Laura's code that the spanning deletion alleles should be added to the events in HaplotypeCallerGenotypingEngine.assignGenotypeLikelihoods.
@davidbenjamin It looks like the issue is with ReferenceConfidenceModel.getOverlappingVariantContext(final GenomeLoc curPos, final Collection<VariantContext> maybeOverlapping), with the stack trace:
For curPos=9:418272, there are 2 variants, 9:418269-418273 TTTTG*,<NON_REF>,T and 9:418272 T*,<NON_REF>,C. This method returns the variant with the right-most start , so the variant with the deletion is ignored. This logic should be changed so that the variants that overlap curPos are merged and returned. A utility similar to GATKVariantContextUtils.simpleMerge might work.
I thought the state I left that branch was that it would output a merged
representation of the two events starting at the deletion position- so the
reference is some string of bases, allele 1 is the deletion, and allele 2
matches the reference for the length of the deletion with the exception of
the SNP. (Allele 2 is not the minimal representation yet.) That's the first
step to get the genotype right. After that we need to break up the events,
clean up the representation, and assign the genotype from the combined
event to both of them.
Hopefully that helps. (And hopefully I actually committed the version of
the branch that does what I said.)
@davidbenjaminhttps://github.com/davidbenjamin It looks like the issue
is with ReferenceConfidenceModel.getOverlappingVariantContext(final
GenomeLoc curPos, final Collection maybeOverlapping),
with the stack trace:
For curPos=9:418272, there are 2 variants, 9:418269-418273
TTTTG*,<NON_REF>,T and 9:418272 T*,<NON_REF>,C. This method returns the
variant with the right-most start , so the variant with the deletion is
ignored. This logic should be changed so that the variants that overlap
curPos are merged and returned. A utility such as GATKVariantContextUtils.
simpleMerge might work.
@cwhelan mentioned that he plans to take a stab at fixing this, so I've added him as a tentative co-assignee. He'll send it back to us if the fix proves too hairy.
It would be nice if the fix also worked for genotype-given-alleles mode in the case where you are given a spanning deletion and a SNP to genotype in the input alleles. I will try to make sure that works.
HC GGA mode will, I guess, not really return the right result if the input alleles include only the SNP but not its spanning deletion.
@vdauwera commented on Wed Mar 22 2017
@chandrans commented on Mon Oct 24 2016
Bug Report
Affected tool(s)
HaplotypeCaller
Affected version(s)
3.6
Description
At this position:
HaplotypeCaller outputs this to the VCF:
9 418269 . TTTTG T 366.73 . AC=1;AF=0.500;AN=2;BaseQRankSum=0.359;ClippingRankSum=0.000;DP=36;ExcessHet=3.0103;FS=0.000;MLEAC=1;MLEAF=0.500;MQ=60.90;MQRankSum=2.404;QD=11.11;ReadPosRankSum=0.041;SOR=0.914 GT:AD:DP:GQ:PL 0/1:21,12:33:99:404,0,760
9 418272 . T C 509.77 . AC=1;AF=0.500;AN=2;BaseQRankSum=-3.075;ClippingRankSum=0.000;DP=36;ExcessHet=3.0103;FS=0.000;MLEAC=1;MLEAF=0.500;MQ=60.90;MQRankSum=-2.152;QD=14.56;ReadPosRankSum=-0.236;SOR=0.784 GT:AD:DP:GQ:PL 0/1:14,21:35:99:538,0,505
The SNP/spanning deletion site is at position 418272. Notice HaplotypeCaller assigns a genotype of 0/1 (T/C). The genotype should really be 1/2 (C/*) with the * allele as the other alternate allele.
Steps to reproduce
Commands and files below.
This Issue was generated from your forums
@chandrans commented on Mon Oct 24 2016
Test Files here:
/humgen/gsa-scr1/schandra/bgrenier_MixingAndMatchingGVCFAndBPRES
Command for HaplotypeCaller:
java -jar /humgen/gsa-hpprojects/GATK/bin/current/GenomeAnalysisTK.jar -T HaplotypeCaller -R /humgen/gsa-hpprojects/GATK/bundle/2.8/b37/human_g1k_v37.fasta -I /humgen/gsa-scr1/schandra/bgrenier_MixingAndMatchingGVCFAndBPRES/ind2.bam -o Sheila.HaplotypeCaller.vcf -L 9 -bamout Sheila.HaplotypeCaller.bam
HaplotypeCaller in GVCF mode:
java -jar /humgen/gsa-hpprojects/GATK/bin/current/GenomeAnalysisTK.jar -T HaplotypeCaller -R /humgen/gsa-hpprojects/GATK/bundle/2.8/b37/human_g1k_v37.fasta -I /humgen/gsa-scr1/schandra/bgrenier_MixingAndMatchingGVCFAndBPRES/ind2.bam -L 9 -o Sheila.HaplotypeCallerGVCF.g.vcf -ERC GVCF
GenotypeGVCFs:
java -jar /humgen/gsa-hpprojects/GATK/bin/current/GenomeAnalysisTK.jar -T GenotypeGVCFs -R /humgen/gsa-hpprojects/GATK/bundle/2.8/b37/human_g1k_v37.fasta -V Sheila.HaplotypeCallerGVCF.g.vcf -o Sheila.GenotypeGVCFsGVCF.vcf
@chandrans commented on Sat Dec 03 2016
User asked about this. Probably won't get to it anytime soon, but I told him/her I would check in.
@vdauwera commented on Mon Dec 05 2016
I responded that we're waiting on the tie-outs.
@ldgauthier commented on Tue Dec 06 2016
Tie-outs?
I started on this, but I definitely won't finish it before the release.
@vdauwera commented on Tue Dec 06 2016
Functional equivalence of GATK4 ports. That's my party line for now. Didn't realize you had actually started on this... but yeah, 3.7 is going out today.
@ronlevine commented on Mon Jan 23 2017
@davidbenjamin Any thoughts on this? Laura thought she implemented a fix but it have any effect. I am becoming familiar with the HC code but it's slow going. While stepping though the code, I noticed it correctly identifies the events for the deletion and SNP.
@davidbenjamin commented on Mon Jan 23 2017
@ronlevine I would put in a few breakpoints to see where the spanning deletion allele gets lost from the SNP site
VariantContext
. I just ran things with the new qual out of curiosity and the bug persists, so it's probably not an issue of previous bugs with multiallelic sites.@ronlevine commented on Mon Jan 23 2017
@davidbenjamin That's what I have been doing. I could not find where the spanning deletion alleles was ever added. I agree with Laura's code that the spanning deletion alleles should be added to the events in
HaplotypeCallerGenotypingEngine.assignGenotypeLikelihoods
.@ronlevine commented on Tue Jan 24 2017
@davidbenjamin It looks like the issue is with
ReferenceConfidenceModel.getOverlappingVariantContext(final GenomeLoc curPos, final Collection<VariantContext> maybeOverlapping)
, with the stack trace:For
curPos=9:418272
, there are 2 variants,9:418269-418273 TTTTG*,<NON_REF>,T
and9:418272 T*,<NON_REF>,C
. This method returns the variant with the right-most start , so the variant with the deletion is ignored. This logic should be changed so that the variants that overlapcurPos
are merged and returned. A utility similar toGATKVariantContextUtils.simpleMerge
might work.@ldgauthier commented on Wed Mar 01 2017
I thought the state I left that branch was that it would output a merged
representation of the two events starting at the deletion position- so the
reference is some string of bases, allele 1 is the deletion, and allele 2
matches the reference for the length of the deletion with the exception of
the SNP. (Allele 2 is not the minimal representation yet.) That's the first
step to get the genotype right. After that we need to break up the events,
clean up the representation, and assign the genotype from the combined
event to both of them.
Hopefully that helps. (And hopefully I actually committed the version of
the branch that does what I said.)
On Jan 24, 2017 12:06 PM, "Ron Levine" [email protected] wrote:
The text was updated successfully, but these errors were encountered: