-
Notifications
You must be signed in to change notification settings - Fork 3
Home
Christian Parobek edited this page Jan 23, 2017
·
37 revisions
I will document my analysis through this wiki. The associated scripts (and results?) will be kept in this repo.
###Getting Additional Populations:
###Steps to good SNP-call data:
-
HaplotypeCaller - decided that I should look into GATK's
HaplotypeCaller
-
genomeCoverage - using
bedtools
and/or GATK'sDepthOfCoverage
. -
variantCalling - documented as part of the gatk_pipeline repo and the
HaplotypeCaller
updates are documented here. - weakestLinks - remove the lowest-coverage samples prior to downstream analysis. Gives us a shot at getting full haplotypes for all SNPs.
- variantFiltering - modeled loosely after Manske et al.
###Multiplicity of Infection:
-
Fws - using my
Rmd
Fws script to decide monoclonal vs multiclonal. -
estMoi - using this
perl
script to calculate an actual MOI for each of my Pv and Pf samples. - pvmsp1 deep sequencing - Deen deep sequenced all these to get a sense of MOI. Want to see how well it agrees with Fws and estMoi.
###Population Differentiation:
- PopGenome - the "Swiss Army Knife" package for population genetic analyses.
-
vcf2structure - VCF file format must be converted to
STRUCTURE
format for bothSTRUCTURE
andadegenet
analysis. structure
-
adegenet
- this is for PCA inR
- recombination
- Derrick's dupFinder
- Structural Variant Analysis - Using primarily LUMPY.
- mal10mal13 - Jess suggested that we look to see if these mutations, described in Michele's Lancet ID paper, correlate with CP group.
###Selective Sweeps:
- hapFLK - Andrew found this and recommended. Looks like it is sensitive and specific for sweeps even in complicated demographic backgrounds, because it makes extended haplotypes, then does Fst on those, somehow.
-
selscan
- This does iHS, EHH, XP-EHH, and something even newer. What I'm going to try to use for looking at selection. -
rehh
- This is what Andrew used to make bifurcation diagrams for EHH stats. - Identity by Descent - A related question to selective sweeps is identity by descent - finding genomic fragments that are shared among samples.
###Cool Images:
-
circos
- Cool,perl
-based circular image maker. But I actually ended up usingcirclize
in R.
###Bottlenecks:
- Loss of Rare Alleles - If we find a greater loss of rare alleles in P. falciparum than in P. vivax, then maybe that's evidence that P. falciparum has undergone a greater genetic bottleneck than P. vivax.
###Demography
- dadi - Inference of population demographic parameters. What Hartl used in their MBE paper.
###Miscellaneous
- snpEff - Use it to predict SNP functions. Will be useful for subsetting variants for class analysis.
-
characterizeCoverage - Use
bedtools
to determine percent of genes covered to a certain depth. - resistanceSNPs - Want to determine the frequency of variants in resistance genes for Pv and Pf.
- exonuclease - Exploring association between CP groups and Chr 13 Exo 415 mutation.