Preparing concordance factors

In this notebook, we will read a table with all bucky concordance factors, translate them from individuals to species and then average over species. Finally, we will export this table as csv so we can use in further analyses.

Let's start by loading packages

In [1]:
using Pkg
pkgs = Pkg.installed()
pkgs["PhyloNetworks"]
┌ Warning: Pkg.installed() is deprecated
└ @ Pkg /scratch/pkrastev/lmod_build/julia-1.5.1/usr/share/julia/stdlib/v1.5/Pkg/src/Pkg.jl:554
Out[1]:
v"0.11.0"
In [2]:
import PhyloNetworks
using CSV
using DataFrames
In [3]:
CF = PhyloNetworks.mapAllelesCFtable("pnetworks_pops.txt","test_cf.csv")
┌ Warning: some alleles in the mapping file do not occur in the quartet CF data. Extra allele names will be ignored
└ @ PhyloNetworks /n/home08/souzademedeiros/.julia/packages/PhyloNetworks/zeoh2/src/multipleAlleles.jl:251
Out[3]:

49 rows × 14 columns (omitted printing of 8 columns)

taxon1taxon2taxon3taxon4CF12_34CF12_34_lo
StringStringStringStringFloat64Float64
1botryophora_Ibotryophora_Ibotryophora_Ibotryophora_I0.33330.0
2coronata_Hcoronata_Bbotryophora_Icoronata_A0.1603890.0
3coronata_Hcoronata_Bbotryophora_Icoronata_A0.740.280702
4coronata_Hcoronata_Bbotryophora_Icoronata_A0.5660710.0714286
5coronata_Hcoronata_Dbotryophora_Icoronata_A0.18020.0
6coronata_Hcoronata_Dbotryophora_Icoronata_A0.1889380.0
7coronata_Hcoronata_Dbotryophora_Icoronata_A0.1736590.0
8coronata_Hcoronata_Dbotryophora_Icoronata_A0.322810.0
9coronata_Hcoronata_Fbotryophora_Icoronata_A0.1050290.0
10coronata_Hcoronata_Fbotryophora_Icoronata_A0.2426710.0273973
11coronata_Hcoronata_Fbotryophora_Icoronata_A0.5165830.148148
12coronata_Hcoronata_Fbotryophora_Icoronata_A0.8631480.477273
13coronata_Hcoronata_Fbotryophora_Icoronata_A0.4449190.116279
14coronata_Hcoronata_Fbotryophora_Icoronata_A0.415860.0465116
15coronata_Hkellyana_Cbotryophora_Icoronata_A0.2875580.0620155
16coronata_Hkellyana_Cbotryophora_Icoronata_A0.4872560.116279
17coronata_Hcf_vagansbotryophora_Icoronata_A0.007802330.0
18coronata_Hcf_vagansbotryophora_Icoronata_A0.007511630.0
19coronata_Hcearensis_Ebotryophora_Icoronata_A0.3830450.120301
20coronata_Hcearensis_Ebotryophora_Icoronata_A0.4395260.142857
21coronata_Holeracea_Cbotryophora_Icoronata_A0.08580950.0
22coronata_Holeracea_Cbotryophora_Icoronata_A0.08659520.0
23coronata_Hcoconut_refbotryophora_Icoronata_A0.004278150.0
24coronata_Hx_costae_Ebotryophora_Icoronata_A0.6676670.259259
25coronata_Hx_costae_Ebotryophora_Icoronata_A0.5695330.2
26coronata_Hcoronata_Gbotryophora_Icoronata_A0.7018310.112676
27coronata_Hcoronata_Gbotryophora_Icoronata_A0.7301390.361111
28coronata_Hcoronata_Gbotryophora_Icoronata_A0.5844170.152778
29coronata_Hcoronata_Gbotryophora_Icoronata_A0.7067270.25974
30coronata_Hcoronata_Gbotryophora_Icoronata_A0.6708950.0877193

Something is wrong with readTableCF!, it is not averaging over species:

In [4]:
dataCF_summarized = PhyloNetworks.readTableCF!(CF)
found 1 4-taxon sets uninformative about between-species relationships, out of 49.
These 4-taxon sets will be deleted from the data frame. 48 informative 4-taxon sets will be used.
between 6.0 and 151.0 gene trees per 4-taxon set
Out[4]:
Object DataCF
number of quartets: 48

When we use mergeRows explicitly, it works:

In [5]:
dataCF_summarized = PhyloNetworks.readTableCF!(PhyloNetworks.mergeRows(CF, [1,2,3,4,5,8,11,14]))
10 unique 4-taxon sets were found. CF values of repeated 4-taxon sets will be averaged (ngenes too).
between 32.375 and 151.0 gene trees per 4-taxon set
Out[5]:
Object DataCF
number of quartets: 10