Skip to content

Commit

Permalink
documentation for clusters and hardwired cluster distance
Browse files Browse the repository at this point in the history
  • Loading branch information
cecileane committed Jan 16, 2023
1 parent e1f8c7e commit 1112fbc
Show file tree
Hide file tree
Showing 2 changed files with 114 additions and 16 deletions.
116 changes: 105 additions & 11 deletions src/compareNetworks.jl
Original file line number Diff line number Diff line change
Expand Up @@ -55,12 +55,16 @@ function tree2Matrix(T::HybridNetwork, S::Union{Vector{String},Vector{Int}}; roo
end

"""
`hardwiredClusters(net::HybridNetwork, S::Union{AbstractVector{String},AbstractVector{Int}})`
hardwiredClusters(net::HybridNetwork, taxon_labels)
Returns a matrix describing all the hardwired clusters in a network.
Warnings: Clusters are rooted, so the root must be correct.
Allows for missing taxa, with entries all 0.
Returns a matrix describing all the hardwired clusters in a network, with
taxa listed in same order as in `taxon_labels` to describe their membership
in each cluster. Allows for missing taxa, with entries all 0.
Warnings:
- clusters are rooted, so the root must be correct.
- each hybrid node is assumed to have exactly 2 parents (no more).
Each row corresponds to one internal edge, that is, external edges are excluded.
If the root is a leaf node, the external edge to that leaf is included (first row).
Expand All @@ -69,6 +73,8 @@ Both parent hybrid edges to a given hybrid node only contribute a single row (th
- first column: edge number
- next columns: 0/1. 1=descendant of edge, 0=not a descendant, or missing taxon.
- last column: 10/11 values. 10=tree edge, 11=hybrid edge
See also [`hardwiredClusterDistance`](@ref) and [`hardwiredCluster`](@ref).
"""
function hardwiredClusters(net::HybridNetwork, S::Union{AbstractVector{String},AbstractVector{Int}})
ne = length(net.edge)-net.numTaxa # number of internal branch lengths
Expand Down Expand Up @@ -168,6 +174,8 @@ julia> hardwiredCluster(net5.edge[12], taxa) # descendants of 12th edge = CEF
1
0
```
See also [`hardwiredClusterDistance`](@ref) and [`hardwiredClusters`](@ref)
"""
function hardwiredCluster(edge::Edge,taxa::Union{AbstractVector{String},AbstractVector{Int}})
v = zeros(Bool,length(taxa))
Expand Down Expand Up @@ -675,19 +683,105 @@ end
"""
hardwiredClusterDistance(net1::HybridNetwork, net2::HybridNetwork, rooted::Bool)
Takes 2 networks and returns their hardwired cluster distance, that is,
the number of hardwired clusters found in one network and not in the other.
Note that this is not a distance per se on the full space of hybrid networks:
there are pairs of different networks for which this measure is 0.
But it is a distance on some network subspaces.
Hardwired cluster distance between the topologies of `net1` and `net2`, that is,
the number of hardwired clusters found in one network and not in the other
(with multiplicity, see below).
If the 2 networks are trees, this is the Robinson-Foulds distance.
If rooted=false, the trees are considered unrooted.
If rooted=false, then both networks are considered as semi-directed.
Networks are assumed bicombining (each hybrid has exactly 2 parents, no more).
## Dissimilarity vs distance
If rooted is false and one of the phylogenies is not a tree (1+ reticulations),
This is *not* a distance per se on the full space of phylogenetic networks:
there are pairs of distinct networks for which this dissimilarity is 0.
But it is a distance on some classes of networks, such as the class of
tree-child networks that are "normal" (without shortcuts), or the class of
tree-child networks that can be assigned node ages such that hybrid edges
have length 0 and tree edges have non-negative lengths. See
[Cardona, Rossello & Valiente (2008)](https://doi.org/10.1016/j.mbs.2007.11.003),
[Cardona, Llabres, Rossello & Valiente (2008)](https://doi.org/10.1109/TCBB.2008.70),
and [Huson, Rupp, Scornavacca (2010)](https://doi.org/10.1017/CBO9780511974076).
## Example
```jldoctest
julia> net1 = readTopology("(t6,(t5,((t4,(t3,((t2,t1))#H1)),#H1)));");
julia> taxa = sort(tipLabels(net1)); # t1 through t6, sorted alphabetically
julia> # using PhyloPlots; plot(net1, showedgenumber=true);
julia> # in matrix below: column 1: edge number. last column: tree (10) vs hybrid (11) edge
# middle columns: for 'taxa': t1,...t6. 1=descendant, 0=not descendant
hardwiredClusters(net1, taxa)
6×8 Matrix{Int64}:
13 1 1 1 1 1 0 10
12 1 1 1 1 0 0 10
10 1 1 1 1 0 0 10
9 1 1 1 0 0 0 10
8 1 1 0 0 0 0 11
7 1 1 0 0 0 0 10
julia> net2 = readTopology("(t6,(t5,((t4,(t3)#H1),(#H1,(t1,t2)))));");
julia> hardwiredClusters(net2, taxa)
6×8 Matrix{Int64}:
13 1 1 1 1 1 0 10
12 1 1 1 1 0 0 10
6 0 0 1 1 0 0 10
5 0 0 1 0 0 0 11
11 1 1 1 0 0 0 10
10 1 1 0 0 0 0 10
julia> hardwiredClusterDistance(net1, net2, true) # true: as rooted networks
4
```
## What is a hardwired cluster?
Each edge in a network is associated with its *hardwired cluster*, that is,
the set of all its descendant taxa (leaves). The set of hardwired cluster
of a network is the set of its edges' hardwired clusters. The dissimilarity
`d_hard` defined in [Huson, Rupp, Scornavacca (2010)](https://doi.org/10.1017/CBO9780511974076)
is the number of hardwired clusters that are in one network but not in the other.
This implementation is a slightly more discriminative version of `d_hard`, where
each cluster is counted with multiplicity and annotated with its edge's hybrid
status, as follows:
- External edges are not counted (they are tree edges to a leaf, shared by all
phylogenetic networks).
- A cluster is counted for each edge for which it's the hardwired cluster.
- At a given hybrid node, both hybrid partner edges have the same cluster,
so this cluster is only counted once for both partners.
- A given cluster is matched between the two networks only if it's the cluster
from a tree edge in both networks, or from a hybrid edge in both networks.
In the example above, `net1` has a shortcut (hybrid edge 11) resulting in 2 tree
edges (12 and 10) with the same cluster {t1,t2,t3,t4}. So cluster {t1,t2,t3,t4}
has multiplicity 2 in `net1`. `net2` also has this cluster, but only associated
with 1 tree edge, so this cluster contributes (2-1)=1 towards the hardwired cluster
distance between the two networks. The distance of 4 corresponds to these 4 clusters:
- {t1,t2,t3,t4}: twice in net1, once in net2
- {t3,t4}: absent in net1, once in net2
- {t1,t2}: twice in net1 (from a hybrid edge & a tree edge), once in net2
- {t3}: absent in net1 (because external edges are not counted),
once in net2 (from a hybrid edge).
Degree-2 nodes cause multiple edges to have the same cluster, so counting
clusters with multiplicity distinguishes a network with extra degree-2 nodes
from the "same" network after these nodes have been suppressed
(e.g. with [`PhyloNetworks.fuseedgesat!`](@ref) or [`PhyloNetworks.shrinkedge!`](@ref)).
## Networks as semi-directed
If `rooted` is false and one of the phylogenies is not a tree (1+ reticulations),
then all degree-2 nodes are removed before comparing the hardwired clusters,
and the minimum distance is returned over all possible ways to root the
networks at internal nodes.
See also: [`hardwiredClusters`](@ref), [`hardwiredCluster`](@ref)
"""
function hardwiredClusterDistance(net1::HybridNetwork, net2::HybridNetwork, rooted::Bool)
bothtrees = (net1.numHybrids == 0 && net2.numHybrids == 0)
Expand Down
14 changes: 9 additions & 5 deletions src/manipulateNet.jl
Original file line number Diff line number Diff line change
Expand Up @@ -95,11 +95,14 @@ end

"""
hybridatnode!(net::HybridNetwork, nodeNumber::Integer)
hybridatnode(net, nodeNumber)
Change the status of edges in network `net`,
to move the hybrid node in a cycle to the node with number `nodeNumber`.
This node must be in one (and only one) cycle, otherwise an error will be thrown.
The second method does not modify `net`, checks that it's of level 1, and
returns the new network after hybrid modification.
`net` is assumed to be of level 1, that is, each blob has a
single cycle with a single reticulation.
Expand Down Expand Up @@ -175,11 +178,10 @@ function hybridatnode!(net::HybridNetwork, hybrid::Node, newNode::Node)
end
end

# function to change the hybrid node in a cycle
# does not assume that the network was read with readTopologyUpdate
# does not modify net0 because it needs to update all attributes
# so, it returns the new network
# Not used anywhere, but tested
# does not call hybridatnode! but repeats its code: oops! violates DRY principle
# nodeNumber should correspond to the number assigned by readTopologyLevel1,
# and the node numbers in `net` are irrelevant.
@doc (@doc hybridatnode!) hybridatnode
function hybridatnode(net0::HybridNetwork, nodeNumber::Integer)
net = readTopologyLevel1(writeTopologyLevel1(net0)) # we need inCycle attributes
Expand Down Expand Up @@ -411,6 +413,8 @@ PhyloNetworks.EdgeT{PhyloNetworks.Node}:
julia> writeTopology(net) # note extra pair of parentheses around S1
"(((S8,S9),((((S4,(S1)),(S5)#H1),(#H1,(S6,S7))))#H2),(#H2,S10));"
```
See also: [`fuseedgesat!`](@ref)
"""
function breakedge!(edge::Edge, net::HybridNetwork)
pn = getparent(edge) # parent node
Expand Down Expand Up @@ -447,7 +451,7 @@ The parent and child edges of this node are fused.
If either of the edges is hybrid, the hybrid edge is retained. Otherwise, the
edge with the lower edge number is retained.
Reverts the action of breakedge!.
Reverts the action of [`breakedge!`](@ref).
returns the fused edge.
"""
Expand Down

0 comments on commit 1112fbc

Please sign in to comment.