Skip to content
Christian Parobek edited this page Nov 20, 2015 · 7 revisions

Miotto in his Nature Genetics paper provides SNP allele counts that define each of his subgroups. Unfortunately, the coordinates in this table are based on the v2 genome (I think; it's v5.5), while ours are based on the v3 genome (v13.0). Talked to Susanne at PlasmoDB and she said there's not a great way of converting between these coordinates within PlasmoDB, but she suggested I perform a "liftover" of coordinates from v2 to v3. Generating the "chain file" needed for this can be complicated, but I found that the group from Sanger has already done this and made it available on their website. Downloaded the chain file, grabbed the pre-compiled executable (for a Ubuntu 12.xx environment) and had to install sudo apt-get install libmysqlclient-dev on my laptop.

Once I got this up and running, I ran the example provided on the website, and it gave the expected results:

christian@t450s:~/Desktop/liftover$ ./liftOver mytest-s3+1.bed 2to3.liftOver newFileS3+1 newUnmapS3+1
Reading liftover chains
Mapping coordinates
christian@t450s:~/Desktop/liftover$ cat newFile
Pf3D7_07_v3	403620	403621	crt

Even though this seems to work, I want to do another check before going wild, mainly to make sure that this will work with our data, and in part to confirm that Miotto's v5.5 3D7 genome is the v2 build. So, get his excel file, and take the chr and position column and paste them into a bed-format file (tab delimited chr->start->end), where start is position - 1 and end is position. Also had to change the chr names from MAL## to Pf3D7_##. Run the liftOver, then intersect with our_goods_UG.pass.vcf, which has ~7200 variants in it. The intersection (via bedtools) of these two files is 619 variants, which might mean this is working well. To check, I took the first variant and verified that it falls in the same gene in both the old and the new genome versions, and it does. To verify even further, I took the same BED file but added 1 to both the start and end positions, then ran the liftOver and bedtools intersect and got only 15 variants. So I think the original start and end positions worked well.

Now that I can convert coordinates, this is what I need to do:

  1. liftOver the entire Miotto bed
  2. Split the VCF into CP group VCFs
  3. Intersect liftedOver bed with by-group VCFs to produce by-group liftOvers
  4. Intersect liftedOver bed with by-group VCFs to produce intersected liftover bed

These are described further in liftOver.sh. Once this is done, I can read in to R and make a boxplot. Lone boxplot is at boxplotter.r, and PCA-with-boxplot is at pca_boxplot_plotter.r.

All of these files live on Kure now, but I did the liftover on my Ubuntu laptop.

Clone this wiki locally