-
Notifications
You must be signed in to change notification settings - Fork 3
liftOver
Miotto in his Nature Genetics paper provides SNP allele counts that define each of his subgroups. Unfortunately, the coordinates in this table are based on the v2 genome (I think; it's v5.5), while ours are based on the v3 genome (v13.0). Talked to Susanne at PlasmoDB and she said there's not a great way of converting between these coordinates within PlasmoDB, but she suggested I perform a "liftover" of coordinates from v2 to v3. Generating the "chain file" needed for this can be complicated, but I found that the group from Sanger has already done this and made it available on their website. Downloaded the chain file, grabbed the pre-compiled executable (for a Ubuntu 12.xx environment) and had to install sudo apt-get install libmysqlclient-dev
on my laptop.
Once I got this up and running, I ran the example provided on the website, and it gave the expected results:
christian@t450s:~/Desktop/liftover$ ./liftOver mytest-s3+1.bed 2to3.liftOver newFileS3+1 newUnmapS3+1
Reading liftover chains
Mapping coordinates
christian@t450s:~/Desktop/liftover$ cat newFile
Pf3D7_07_v3 403620 403621 crt
Even though this seems to work, I want to do another check before going wild, mainly to make sure that this will work with our data, and in part to confirm that Miotto's v5.5 3D7 genome is the v2 build. So, get his excel file, and take the chr and position column and paste them into a bed-format file (tab delimited chr->start->end), where start is position - 1 and end is position. Also had to change the chr names from MAL## to Pf3D7_##. Run the liftOver
, then intersect with our_goods_UG.pass.vcf
, which has ~7200 variants in it. The intersection (via bedtools
) of these two files is 619 variants, which might mean this is working well. To check, I took the first variant and verified that it falls in the same gene in both the old and the new genome versions, and it does. To verify even further, I took the same BED file but added 1 to both the start and end positions, then ran the liftOver
and bedtools intersect
and got only 15 variants. So I think the original start and end positions worked well.
Now that I can convert