-
Notifications
You must be signed in to change notification settings - Fork 3
variantFiltering 2
Jeff wanted to know whether our GATK
quality filtering pipeline is throwing out SNPs at sites that have two alleles in MOI > 1 samples. This could get rid of truly informative sites. So, do two things:
- Think through the GATK filtering pipeline. What does it do with variant sites that have two different nucleotides in a single clinical isolates (i.e. have MOI > 1)?
- Somehow use
estMOI
or similar to determine what the polyclonality of our samples is. This is kind of circular because such programs rely on me providing it a list of SNPs I already called at which to perform the MOI analysis. - I'm unhappy with approach #2, above, since it is dependent on the variant-calling process itself. So I think the best option is to use a variant-calling-independent way of determining MOI to stratify samples into monoclonal- and polyclonal-isolate pools, then run the VCF calling and population genetics on those two pools to ensure that we're getting the same results. If the result is the same, then we'll accept the 81K SNPs that we have now.
So here's the thing: Recluster the pvmsp1 reads that Deen sequenced and Nick M clustered. Keep a close record of what I did. Since I only pvmsp1 deep sequenced the OM samples - and not BB or KP - maybe I'll have to restrict my analysis to those.
The other option is to do Fws for SNPs.
Wondering whether using GATK's HaplotypeCaller
is a better option than UnifiedGenotyper
. HaplotypeCaller
is now recommended for calling any ploidy (they call it omniploidy). However, this comment suggests that HC
will not work with mixed ploidy across samples, which is really what we're going for. Nevertheless, Dan Neafsey told us that HC could be better for calling in mixed isolates:
GATK haplotypecaller might do a better job in mixed samples than Unified Genotyper, particularly for SNP-dense regions. I haven't seen HC dramatically outperform UG on simple single infection samples, but it could shine with mixtures. That's also on my list of things to try someday.