Skip to content
Christian Parobek edited this page Jan 23, 2017 · 37 revisions

Welcome to the cambodiaWGS wiki!

I will document my analysis through this wiki. The associated scripts (and results?) will be kept in this repo.

###Getting Additional Populations:

  • getting otherPops
  • getting liftOver coordinates to compare my population with Miotto's SNP calls

###Steps to good SNP-call data:

  • HaplotypeCaller - decided that I should look into GATK's HaplotypeCaller
  • genomeCoverage - using bedtools and/or GATK's DepthOfCoverage.
  • variantCalling - documented as part of the gatk_pipeline repo and the HaplotypeCaller updates are documented here.
  • weakestLinks - remove the lowest-coverage samples prior to downstream analysis. Gives us a shot at getting full haplotypes for all SNPs.
  • variantFiltering - modeled loosely after Manske et al.

###Multiplicity of Infection:

  • Fws - using my Rmd Fws script to decide monoclonal vs multiclonal.
  • estMoi - using this perl script to calculate an actual MOI for each of my Pv and Pf samples.
  • pvmsp1 deep sequencing - Deen deep sequenced all these to get a sense of MOI. Want to see how well it agrees with Fws and estMoi.

###Population Differentiation:

###Selective Sweeps:

  • hapFLK - Andrew found this and recommended. Looks like it is sensitive and specific for sweeps even in complicated demographic backgrounds, because it makes extended haplotypes, then does Fst on those, somehow.
  • selscan - This does iHS, EHH, XP-EHH, and something even newer. What I'm going to try to use for looking at selection.
  • rehh - This is what Andrew used to make bifurcation diagrams for EHH stats.
  • Identity by Descent - A related question to selective sweeps is identity by descent - finding genomic fragments that are shared among samples.

###Cool Images:

  • circos - Cool, perl-based circular image maker. But I actually ended up using circlize in R.

###Bottlenecks:

  • Loss of Rare Alleles - If we find a greater loss of rare alleles in P. falciparum than in P. vivax, then maybe that's evidence that P. falciparum has undergone a greater genetic bottleneck than P. vivax.

###Demography

  • dadi - Inference of population demographic parameters. What Hartl used in their MBE paper.

###Miscellaneous

  • snpEff - Use it to predict SNP functions. Will be useful for subsetting variants for class analysis.
  • characterizeCoverage - Use bedtools to determine percent of genes covered to a certain depth.
  • resistanceSNPs - Want to determine the frequency of variants in resistance genes for Pv and Pf.
  • exonuclease - Exploring association between CP groups and Chr 13 Exo 415 mutation.
Clone this wiki locally