Skip to content

Latest commit

 

History

History
139 lines (112 loc) · 5.09 KB

getting_started.md

File metadata and controls

139 lines (112 loc) · 5.09 KB
using PhyloNetworks, PhyloCoalSimulations, RCall
figpath = joinpath("..", "assets", "figures"); mkpath(figpath)
figname(x) = joinpath(figpath, x)
using Random; Random.seed!(432)

getting started

installation

To install Julia see here and to install Julia packages, see here. To install PhyloCoalSimulations in the Julia REPL (as well as PhyloNetworks for many utilities), enter package mode with ] and do this:

add PhyloCoalSimulations
add PhyloNetworks

or do this in julian mode:

using Pkg
Pkg.add("PhyloCoalSimulations")
Pkg.add("PhyloNetworks")

basic simulation example

example network

For a basic example, we use a simple 3-species network plotted below. On the left, the plot shows the edge numbers (black) and the γ inheritance values (blue). On the right, the length of horizontal lines are proportional to edge lengths, and the plot shows the edge length values.

using PhyloNetworks
net = readnewick("((C:0.9,(B:0.2)#H1:0.7::0.6):0.6,(#H1:0.6,A:1):0.5);");
using PhyloPlots
R"svg"(figname("net3taxa.svg"), width=6, height=3); # hide
R"par"(mar=[.1,.1,.1,.1]); R"layout"([1 2]); # hide
plot(net, showedgenumber=true, shownodelabel=true, showgamma=true, tipoffset=0.1);
R"mtext"("in grey: edge numbers", side=1, line=-1);  # hide
plot(net, showedgelength=true, useedgelength=true, tipoffset=0.1);
R"mtext"("in black: edge lengths", side=1, line=-1);  # hide
R"dev.off()" # hide
nothing # hide

3-taxon network

Note that this example network is not time consistent: the length of the path from the root to the hybridization node H1 is different depending if we go through the major edge (0.6+0.7=1.3) or the minor edge (0.5+0.6=1.1).

Coalescent simulations can be performed along such networks, also along non-ultrametric networks. If the network is ultrametric (time-consistent, and with all tips at the same distance from the root), then gene trees will also be ultrametric.

!!! note "rooting" The input network needs to be rooted correctly, as some properties of gene trees may depend on the root. If a network was estimated using a method that estimates a semidirected network (where the root placement is arbitrary) then make sure to root this network appropriately, e.g. using an outgroup, before simulating genes along this network.

basic example: simulate, save to file, plot

We use simulatecoalescent to simulate gene trees along this network. Below, we simulate 2 gene trees, with 1 individual per species.

trees = simulatecoalescent(net, 2, 1)

Branch lengths are assumed to be in coalescent units in the species network (number of generations / effective population size), and edge lengths in gene trees are also in coalescent units.

We can work with these gene trees within Julia with downstream code, and/or we can save them to a file:

writemultinewick(trees, stdout) # write them to standout output (screen here)
writemultinewick(trees, "genetrees.phy") # warning: will overwrite "genetrees.phy" if this file existed
rm("genetrees.phy") # hide

Let's plot these 2 gene trees. In the plots below, we annotate each edge with its attribute that tracked the network edge on which the coalescent event occured (where the gene tree lineage originated, going back in time). For example, the gene lineage that ends in A is always mapped to network edge 6, which is the number of the external edge to A in the network (see plot of network above on the left).

R"svg"(figname("genetrees_gettingstarted_1.svg"), width=6, height=3); # hide
R"par"(mar=[.1,.1,.1,.1]); R"layout"([1 2]); # hide
using DataFrames
for i in 1:2
  gt = trees[i]
  plot(gt, tipoffset=0.1,
           edgelabel=DataFrame(number = [e.number  for e in gt.edge],
                               label  = [population_mappedto(e) for e in gt.edge]));
  R"mtext"("gene $i", line=-1) # hide
end
R"mtext"("numbers: network edge each gene lineage maps to, at time of coalescence.\n8 = number of edge above the network root", side=1, line=-1, outer=true);  # hide
R"dev.off()" # hide
nothing # hide

3-taxon network

several individuals per species

We can ask for more individuals. To simulate 3 individuals / species for example:

simulatecoalescent(net, 1, 3) # 1 gene tree only. 3 individuals in each species

We can also ask for varying numbers of individuals. For example, we simulate below 2 individuals in A and 1 individual in each of B and C, using a dictionary to map species to their number of individuals:

genetrees = simulatecoalescent(net, 1, Dict("A"=>2, "B"=>1, "C"=>1));
writenewick(genetrees[1])

We can set 0 individuals within a species to simulate missing data.

genetrees = simulatecoalescent(net, 3, Dict("A"=>2, "B"=>1, "C"=>0));
writemultinewick(genetrees, stdout)