Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parenthetical format in simulated gene trees #11

Closed
crsl4 opened this issue Jun 16, 2023 · 1 comment
Closed

Parenthetical format in simulated gene trees #11

crsl4 opened this issue Jun 16, 2023 · 1 comment

Comments

@crsl4
Copy link
Member

crsl4 commented Jun 16, 2023

I am using the code below to simulate gene trees under a species tree with branches in numbers of generations and different population sizes per branch.
The simulated gene trees I get have a strange parenthetical format:
((D:40000.0)minus2:11184.0,(((C:20000.0)minus3:25551.0,(((A:10000.0)minus4:11096.0,(B:10000.0)minus4:11096.0):8904.0)minus3:25551.0):14449.0)minus2:11184.0);

Code:

using PhyloNetworks
using PhyloCoalSimulations
using PhyloPlots

tre = readTopology("(((A:10000,B:10000):20000,C:20000):40000,D:40000);")
plot(tre, showedgenumber=true, showedgelength=true);

## Population sizez (order by edge number in tre)
pop = [20000,20000,20000,20000,100000,20000]

## We now need to build a dictionary
Ne = Dict(i => pop[i] for i in 1:length(pop));

## Need to add the population at the root
rootedgenumber = PhyloCoalSimulations.get_rootedgenumber(tre)
push!(Ne, rootedgenumber => 20000);

## To chech that we built our dictionary correctly
using DataFrames, RCall
plot(tre, tipoffset=0.1, showedgelength=true, edgelabelcolor="red4",
          edgelabel=DataFrame(n=[e.number for e in tre.edge],
                              l=[Ne[e.number] for e in tre.edge]));
R"text"(x=1, y=2.5, Ne[rootedgenumber], adj=1, col="red4");
R"mtext"("red: Ne values", side=1, line=-1.5, col="red4");
R"mtext"("black: edge lengths", side=1, line=-0.5);

## Simulating the gene trees:
genetree_gen = simulatecoalescent(tre,1000000,1, Ne; nodemapping=true);
writeMultiTopology(genetree_gen, "genetrees-pop.tre")
@cecileane
Copy link
Member

The output gene trees would look more standard if you used the option nodemapping=false. The node mapping creates degree-2 node in gene trees, and gives them a name that corresponds to the name of a speciation or reticulation in the species network. More specifically, in each gene tree, a degree-2 node is created each time that the gene tree passes through a node in the species tree. In the gene tree, this degree-2 node is named after the corresponding node in the species network. There are more explanations and a figure in the documentation here.

That's a great option if you want to vary things across species in your simulation. The node mapping tells you which species each gene lineage evolved into. The manual gives a example here of how this mapping can be used to vary the substitution rate of each gene lineage according to the species that this lineage is in.

So:

  1. a pair of parentheses around a taxon name, like (D:40000.0), corresponds a degree-2 node at the time when the ancestor of D passed through the parent node of D in the species network. That was 40,000 generations prior to individual D.
  2. weird names like minus2 in (D:40000.0)minus2 are default names given to nodes in the species network. They are different from the node numbers. If the species network had names for all nodes, then these names are used. That's the most robust thing to do, because these ancestral taxon names can be saved in the newick format for the species network. If the species network is lacking names for some nodes, then default names are assigned. For a node with number -2, the default name is minus2.

If you don't want to use these degree-2 nodes, then we can turn off the node mapping option. Or you can save these simulated gene trees with the mapping information (just in case, who knows) and then clean them up (remove the degree-2 nodes) later: see here.

If you have some use for these degree-2 mapping nodes, then the manual shows here how to assign nicer names to nodes in the species network. And then I suggest that the modified network (with these nicer node names) is saved to a file, so that the correspondance can be made later between the ancestral species names and the degree-2 nodes (named after ancestral species) in the gene trees.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants