Skip to content
This repository has been archived by the owner on Nov 9, 2019. It is now read-only.

HaplotypeCaller outputs incorrect genotype at spanning deletion/SNP site #952

Closed
vdauwera opened this issue Mar 23, 2017 · 1 comment
Closed

Comments

@vdauwera
Copy link
Contributor

@chandrans commented on Mon Oct 24 2016

Bug Report

Affected tool(s)

HaplotypeCaller

Affected version(s)

3.6

Description

At this position:
screen shot 2016-10-24 at 3 55 16 pm
HaplotypeCaller outputs this to the VCF:
9 418269 . TTTTG T 366.73 . AC=1;AF=0.500;AN=2;BaseQRankSum=0.359;ClippingRankSum=0.000;DP=36;ExcessHet=3.0103;FS=0.000;MLEAC=1;MLEAF=0.500;MQ=60.90;MQRankSum=2.404;QD=11.11;ReadPosRankSum=0.041;SOR=0.914 GT:AD:DP:GQ:PL 0/1:21,12:33:99:404,0,760

9 418272 . T C 509.77 . AC=1;AF=0.500;AN=2;BaseQRankSum=-3.075;ClippingRankSum=0.000;DP=36;ExcessHet=3.0103;FS=0.000;MLEAC=1;MLEAF=0.500;MQ=60.90;MQRankSum=-2.152;QD=14.56;ReadPosRankSum=-0.236;SOR=0.784 GT:AD:DP:GQ:PL 0/1:14,21:35:99:538,0,505

The SNP/spanning deletion site is at position 418272. Notice HaplotypeCaller assigns a genotype of 0/1 (T/C). The genotype should really be 1/2 (C/*) with the * allele as the other alternate allele.

Steps to reproduce

Commands and files below.

This Issue was generated from your forums


@chandrans commented on Mon Oct 24 2016

Test Files here:
/humgen/gsa-scr1/schandra/bgrenier_MixingAndMatchingGVCFAndBPRES

Command for HaplotypeCaller:
java -jar /humgen/gsa-hpprojects/GATK/bin/current/GenomeAnalysisTK.jar -T HaplotypeCaller -R /humgen/gsa-hpprojects/GATK/bundle/2.8/b37/human_g1k_v37.fasta -I /humgen/gsa-scr1/schandra/bgrenier_MixingAndMatchingGVCFAndBPRES/ind2.bam -o Sheila.HaplotypeCaller.vcf -L 9 -bamout Sheila.HaplotypeCaller.bam

HaplotypeCaller in GVCF mode:
java -jar /humgen/gsa-hpprojects/GATK/bin/current/GenomeAnalysisTK.jar -T HaplotypeCaller -R /humgen/gsa-hpprojects/GATK/bundle/2.8/b37/human_g1k_v37.fasta -I /humgen/gsa-scr1/schandra/bgrenier_MixingAndMatchingGVCFAndBPRES/ind2.bam -L 9 -o Sheila.HaplotypeCallerGVCF.g.vcf -ERC GVCF

GenotypeGVCFs:
java -jar /humgen/gsa-hpprojects/GATK/bin/current/GenomeAnalysisTK.jar -T GenotypeGVCFs -R /humgen/gsa-hpprojects/GATK/bundle/2.8/b37/human_g1k_v37.fasta -V Sheila.HaplotypeCallerGVCF.g.vcf -o Sheila.GenotypeGVCFsGVCF.vcf


@chandrans commented on Sat Dec 03 2016

User asked about this. Probably won't get to it anytime soon, but I told him/her I would check in.


@vdauwera commented on Mon Dec 05 2016

I responded that we're waiting on the tie-outs.


@ldgauthier commented on Tue Dec 06 2016

Tie-outs?

I started on this, but I definitely won't finish it before the release.


@vdauwera commented on Tue Dec 06 2016

Functional equivalence of GATK4 ports. That's my party line for now. Didn't realize you had actually started on this... but yeah, 3.7 is going out today.


@ronlevine commented on Mon Jan 23 2017

@davidbenjamin Any thoughts on this? Laura thought she implemented a fix but it have any effect. I am becoming familiar with the HC code but it's slow going. While stepping though the code, I noticed it correctly identifies the events for the deletion and SNP.


@davidbenjamin commented on Mon Jan 23 2017

@ronlevine I would put in a few breakpoints to see where the spanning deletion allele gets lost from the SNP site VariantContext. I just ran things with the new qual out of curiosity and the bug persists, so it's probably not an issue of previous bugs with multiallelic sites.


@ronlevine commented on Mon Jan 23 2017

@davidbenjamin That's what I have been doing. I could not find where the spanning deletion alleles was ever added. I agree with Laura's code that the spanning deletion alleles should be added to the events in HaplotypeCallerGenotypingEngine.assignGenotypeLikelihoods.


@ronlevine commented on Tue Jan 24 2017

@davidbenjamin It looks like the issue is with ReferenceConfidenceModel.getOverlappingVariantContext(final GenomeLoc curPos, final Collection<VariantContext> maybeOverlapping), with the stack trace:

ReferenceConfidenceModel.calculateRefConfidence
ReferenceConfidenceModel.getOverlappingVariantContext
HaplotypeCaller.map

For curPos=9:418272, there are 2 variants, 9:418269-418273 TTTTG*,<NON_REF>,T and 9:418272 T*,<NON_REF>,C. This method returns the variant with the right-most start , so the variant with the deletion is ignored. This logic should be changed so that the variants that overlap curPos are merged and returned. A utility similar to GATKVariantContextUtils.simpleMerge might work.


@ldgauthier commented on Wed Mar 01 2017

I thought the state I left that branch was that it would output a merged
representation of the two events starting at the deletion position- so the
reference is some string of bases, allele 1 is the deletion, and allele 2
matches the reference for the length of the deletion with the exception of
the SNP. (Allele 2 is not the minimal representation yet.) That's the first
step to get the genotype right. After that we need to break up the events,
clean up the representation, and assign the genotype from the combined
event to both of them.

Hopefully that helps. (And hopefully I actually committed the version of
the branch that does what I said.)

On Jan 24, 2017 12:06 PM, "Ron Levine" [email protected] wrote:

@davidbenjamin https://github.com/davidbenjamin It looks like the issue
is with ReferenceConfidenceModel.getOverlappingVariantContext(final
GenomeLoc curPos, final Collection maybeOverlapping),
with the stack trace:

ReferenceConfidenceModel.calculateRefConfidence
ReferenceConfidenceModel.getOverlappingVariantContext
HaplotypeCaller.map

For curPos=9:418272, there are 2 variants, 9:418269-418273
TTTTG*,<NON_REF>,T and 9:418272 T*,<NON_REF>,C. This method returns the
variant with the right-most start , so the variant with the deletion is
ignored. This logic should be changed so that the variants that overlap
curPos are merged and returned. A utility such as GATKVariantContextUtils.
simpleMerge might work.


You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub
https://github.com/broadinstitute/gsa-unstable/issues/1499#issuecomment-274868889,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AGRhdCAl7601U3RRbph6rOi0E7dqNawRks5rVi-fgaJpZM4KfOWm
.

@droazen
Copy link
Contributor

droazen commented Jun 5, 2017

Issue moved to broadinstitute/gatk #2960 via ZenHub

@droazen droazen closed this as completed Jun 5, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants