Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need to handle multiallelic variants that have been decomposed and normalized. #23

Open
arq5x opened this issue Mar 24, 2015 · 0 comments

Comments

@arq5x
Copy link
Collaborator

arq5x commented Mar 24, 2015

The appropriate workflow for multi-allelic variants is to decompose and normalize them so that each REF/ALT combination produces a distinct record. This is a preprocessing step that will be done to the VCF before it is given to GQT. For example, consider the following multi-allelic record:

2   44101649    .   G   GC,GCC,GCCC,GCCCC   1/3:0,20,15,0,0:42:99:1528,741,642,703,0,777,1173,533,675,1117,1221,504,873,1158,1478

After decomposing and normalizing with vt, this will be split into 4 records.

2   44101649    .   G   GC  1/.:0,20,15,0,0:42:99:1528,741,642
2   44101649    .   G   GCC ./.:0,20,15,0,0:42:99:1528,703,777
2   44101649    .   G   GCCC    ./1:0,20,15,0,0:42:99:1528,1173,1117
2   44101649    .   G   GCCCC   ./.:0,20,15,0,0:42:99:1528,1221,1478

Notice that since the genotype in the original record is a heterozygote 1/3, the individual's genotype is a heterozygote for the first and third new record. GQT needs to recognize genotypes such as "1/." and "./1" as heterozygotes, not unknowns.

Now that GEMINI handles this decompsed VCFs (work from @brentp), adding this functionality will facilitate using GQT as the pre-processing engine underlying GEMINI.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant