Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v0.1.2; bump compat for PhyloNetworks to v0.16 #10

Merged
merged 4 commits into from
Apr 3, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 22 additions & 0 deletions CITATION.bib
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
% reference for the package, and for the correlated inheritance model
@article{2023fogg_phylocoalsimulations,
author = {Fogg, John and Allman, Elizabeth S and An{\'e}, C{\'e}cile},
title = {{PhyloCoalSimulations}: A simulator for network multispecies coalescent
models, including a new extension for the inheritance of gene flow},
elocation-id = {2023.01.11.523690},
year = {2023},
doi = {10.1101/2023.01.11.523690},
journal = {bioRxiv}
}

% reference for the PhyloNetworks package, which this package depends on
@article{2017SolislemusBastideAne_PhyloNetworks,
author = {Sol{\'\i}s-Lemus, Claudia and Bastide, Paul and An{\'e}, C{\'e}cile},
title = {PhyloNetworks: A Package for Phylogenetic Networks},
journal = {Molecular Biology and Evolution},
year = {2017},
volume = {34},
number = {12},
pages = {3292-3298},
doi = {10.1093/molbev/msx235},
}
4 changes: 2 additions & 2 deletions Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -2,13 +2,13 @@ name = "PhyloCoalSimulations"
uuid = "53f7b83a-06b5-4910-9a21-1896bf762ade"
authors = ["Cecile Ane <[email protected]>",
"John Fogg <[email protected]>"]
version = "0.1.1"
version = "0.1.2"

[deps]
Distributions = "31c24e10-a181-5473-b8eb-7969acd0382f"
PhyloNetworks = "33ad39ac-ed31-50eb-9b15-43d0656eaa72"

[compat]
Distributions = "0.25"
PhyloNetworks = "0.15.3"
PhyloNetworks = "0.16"
julia = "1.5"
4 changes: 4 additions & 0 deletions docs/make.jl
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,10 @@ makedocs(;
"correlated inheritance" => "man/correlated_inheritance.md",
"more examples" => "man/more_examples.md",
],
"library" => [
"public" => "lib/public.md",
"internals" => "lib/internal.md",
]
],
)

Expand Down
34 changes: 23 additions & 11 deletions docs/src/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,22 @@ simulate phylogenies under the coalescent.
It depends on [PhyloNetworks](https://github.com/crsl4/PhyloNetworks.jl)
for the phylogenetic data structures, and manipulation of phylogenies.

References:
please see [bibtex entries here](https://github.com/cecileane/PhyloCoalSimulations.jl/blob/main/CITATION.bib).

- for this package and the network coalescent model with inheritance correlation:\
[Fogg, Allman & Ané (2023)](https://doi.org/10.1101/2023.01.11.523690)
PhyloCoalSimulations: A simulator for network multispecies coalescent models,
including a new extension for the inheritance of gene flow.

- for [PhyloNetworks](https://github.com/crsl4/PhyloNetworks.jl),
which this package depends on:\
Solís-Lemus, Bastide & Ané (2017).
PhyloNetworks: a package for phylogenetic networks.
[Molecular Biology and Evolution](https://academic.oup.com/mbe/article/doi/10.1093/molbev/msx235/4103410/PhyloNetworks-a-package-for-phylogenetic-networks?guestAccessKey=230afceb-df28-4160-832d-aa7c73f86369)
34(12):3292–3298.
[doi:10.1093/molbev/msx235](https://doi.org/10.1093/molbev/msx235)

For a tutorial, see the manual:

```@contents
Expand All @@ -23,16 +39,12 @@ Pages = [
Depth = 3
```

References:
upcoming

## functions

Most functions are internal (not exported).
For help on individual functions, see the library:

```@index
```

```@autodocs
Modules = [PhyloCoalSimulations]
```@contents
Pages = [
"lib/public.md",
"lib/internal.md",
]
Depth = 3
```
21 changes: 21 additions & 0 deletions docs/src/lib/internal.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# internal documentation

Documentation for `PhyloCoalSimulations`'s internal functions.
These functions are not exported and their access (API) should not be
considered stable. But they can still be used, like this for example:
`PhyloCoalSimulations.foo()` for a function named `foo()`.


## functions & types

```@autodocs
Modules = [PhyloCoalSimulations]
Public = false
Order = [:type,:function]
```

## index

```@index
Pages = ["internal.md"]
```
18 changes: 18 additions & 0 deletions docs/src/lib/public.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# public documentation

Documentation for `PhyloCoalSimulations`'s public (exported) functions.
Most functions are internal (not exported).

## functions & types

```@autodocs
Modules = [PhyloCoalSimulations]
Private = false
Order = [:function,:type]
```

## index

```@index
Pages = ["public.md"]
```
33 changes: 26 additions & 7 deletions docs/src/man/correlated_inheritance.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,23 +16,42 @@ Another model is that the lineages at a hybrid node are *all* inherited
from the same parent, still choosing a (common) parent according to the γ
inheritance probabilities. This model has full (and positive) correlation
between lineages, and was used by
[Gerard, Gibbs & Kubatko (2011)](https://doi.org/10.1186/1471-2148-11-291).
[Gerard, Gibbs & Kubatko (2011)](https://doi.org/10.1186/1471-2148-11-291)
for example.

The other extreme might be interesting for modelling allopolyploid events:
of 2 lineages of the same locus at a hybrid node, exactly 1 of them comes
from one parent, and the other comes from the other parent. This model
would have negative correlation between lineages.

By default, `simulatecoalescent` uses the traditional model, with independent
lineages.
By default, [`simulatecoalescent`](@ref) uses the traditional model,
with independent lineages.

It also has an option to simulate lineage inheritance with positive correlation.
It also has an option to simulate lineage inheritance with positive correlation,
under a coalescent model described in
[Fogg, Allman & Ané (2023)](https://doi.org/10.1101/2023.01.11.523690).
For this, lineages' parents are drawn according to a Dirichlet process with
base distribution determined by the γ values, and with concentration parameter
α = (1-r)/r, that is, r = 1/(1+α), where r is the input inheritance correlation.
For example, if this correlation is set to 1, then α = 0 and all lineages inherit
from the same (randomly sampled) parent. The independence model corresponds
to r=0 and α infinite.
More specifically, consider 2 individuals (alleles) at a given locus,
that have not coalesced yet and are present at a give hybrid node.
According to the coalescent model with correlated inheritance,
the second individual is inherited from the same parent as the first individual
with probability r. And with probability 1-r, the second individual is
inherited from any parent (including the parent chosen by the first individual)
based on their γ values.

This model can be the result of different loci evolving according to different γ
inheritance values. For example, loci that are under selection in the environment
where gene flow occurred may be more likely to be passed through gene flow,
whereas loci that are involved in reproduction barriers might be less likely
to be passed through gene flow. This would result in different sets of γ values
across different loci, and correlated inheritance overall.

At one extreme, with correlation r=1 we have α = 0 and all lineages inherit
from the same (randomly sampled) parent. This is the *common inheritance* model.
The *independence model* corresponds to the other extreme r=0 and α infinite.

The same correlation r (or concentration α) parameter is used at all hybrid nodes,
but the Dirichlet process is applied independently across hybrid nodes.

Expand Down
2 changes: 1 addition & 1 deletion docs/src/man/more_examples.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ to extract the mapping information.
edge_count = Dict(e.number => 0 for e in net.edge)
const PCS = PhyloCoalSimulations; # for lazy typing!
for n in PCS.mappingnodes(tree) # iterate over degree-2 mapping nodes in the gene tree
child = PCS.singlechildedge(n)
child = getchildedge(n)
popid = PCS.population_mappedto(child) # number of species edge that 'n' came from
# sanity check below
isnothing(popid) && error("""population ID not found for the child edge of
Expand Down
10 changes: 7 additions & 3 deletions src/simulatecoalescent_network.jl
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,8 @@ can be used as a unique identifier of the edge above the network's root.
get_rootedgenumber(net) = max(0, maximum(e.number for e in net.edge)) + 1

"""
simulatecoalescent(net, nloci, nindividuals; nodemapping=false, inheritancecorrelation=0.0)
simulatecoalescent(net, nloci, nindividuals;
nodemapping=false, inheritancecorrelation=0.0)

Simulate `nloci` gene trees with `nindividuals` from each species
under the multispecies network coalescent, along network `net`
Expand Down Expand Up @@ -54,6 +55,8 @@ the same (randomly sampled) parent. More generally, the lineages' parents
are simulated according to a Dirichlet process with base distribution determined
by the γ values, and with concentration parameter α = (1-r)/r, that is, r = 1/(1+α),
where `r` is the input inheritance correlation.
For more details about this model, please read the package manual or refer
to [Fogg, Allman & Ané (2023)](https://doi.org/10.1101/2023.01.11.523690).

Assumptions:
- `net` must have non-missing edge lengths and γ values.
Expand Down Expand Up @@ -173,7 +176,7 @@ function simulatecoalescent(net::PN.HybridNetwork, nloci::Integer, nindividuals;
parentedgelist = PN.Edge[]
childedgelist = PN.Edge[]
for e in nn.edge
PN.getChild(e) === nn ? push!(parentedgelist, e) : push!(childedgelist, e)
PN.getchild(e) === nn ? push!(parentedgelist, e) : push!(childedgelist, e)
end
push!(parentedges, parentedgelist)
push!(childedges, childedgelist)
Expand Down Expand Up @@ -269,7 +272,8 @@ end

"""
simulatecoalescent(net, nloci, nindividuals, populationsize;
nodemapping=false, round_generationnumber=true, inheritancecorrelation=0.0)
nodemapping=false, round_generationnumber=true,
inheritancecorrelation=0.0)

Simulate `nloci` gene trees with `nindividuals` from each species
under the multispecies network coalescent, along network `net`,
Expand Down
2 changes: 1 addition & 1 deletion src/simulatecoalescent_onepop.jl
Original file line number Diff line number Diff line change
Expand Up @@ -273,7 +273,7 @@ function convert2tree!(rootnode::PN.Node)
end
function collect_edges_nodes!(net, parentedge)
push!(net.edge, parentedge)
nn = PN.getChild(parentedge)
nn = PN.getchild(parentedge)
push!(net.node, nn)
for childedge in nn.edge
childedge !== parentedge || continue # skip parentedge
Expand Down
25 changes: 2 additions & 23 deletions src/utils.jl
Original file line number Diff line number Diff line change
@@ -1,24 +1,3 @@
"""
hassinglechild(node)

Boolean: `true` if `node` has a single child edge,
based on the edge's `isChild1` attribute.
This function will soon be moved to PhyloNetworks.
"""
hassinglechild(node::PN.Node) = sum(e -> PN.getParent(e) === node, node.edge) == 1

"""
singlechildedge(node)

Child edge of `node`. Checks that it's a single child.
This function will soon be moved to PhyloNetworks.
"""
function singlechildedge(node::PN.Node)
ce_ind = findall(e -> PN.getParent(e) === node, node.edge)
length(ce_ind) == 1 || error("node number $(node.number) has $(length(ce_ind)) children instead of 1 child")
return node.edge[ce_ind[1]]
end

"""
population_mappedto(edge or node)

Expand All @@ -35,12 +14,12 @@ population_mappedto(e::Union{PN.Edge,PN.Node}) = (e.inCycle == -1 ? nothing : e.
Boolean: true if `node` is of degree 2, has a single child, and has a name.
(The root is of degree-2 but is not a mapping node).
"""
ismappingnode(n::PN.Node) = length(n.edge) == 2 && hassinglechild(n) && n.name != ""
ismappingnode(n::PN.Node) = length(n.edge) == 2 && PN.hassinglechild(n) && n.name != ""

"""
mappingnodes(gene tree)

Type to define an iterator over degree-2 mapping node in a gene tree, assuming these
Type to define an iterator over degree-2 mapping nodes in a gene tree, assuming these
degree-2 nodes (other than the root) have a name to map them to nodes in a
species phylogeny. See [`ismappingnode`](@ref PhyloCoalSimulations.mappingnodes).
"""
Expand Down
2 changes: 1 addition & 1 deletion test/test_multispeciesnetwork.jl
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ genetree = simulatecoalescent(net, 1, Dict("A"=>2, "B"=>1))
genetree = simulatecoalescent(net, 1, 3; nodemapping=true)[1]
tmp = map(n -> n.name, PCS.mappingnodes(genetree))
@test !isempty(tmp) && all(tmp .== "minus2")
@test Set(unique(PCS.population_mappedto.(PCS.singlechildedge.(PCS.mappingnodes(genetree))))) == Set((1,2))
@test Set(unique(PCS.population_mappedto.(PN.getchildedge.(PCS.mappingnodes(genetree))))) == Set((1,2))
@test Set(unique(PCS.population_mappedto.(genetree.node))) == Set((1,2,3,nothing))
@test Set(unique(PCS.population_mappedto.(genetree.edge))) == Set((1,2,3))

Expand Down