-
Notifications
You must be signed in to change notification settings - Fork 2
6. nodeomega
If you want to use Bayescode to infer changes in ω and μ along the phylogeny, use the program nodeomega
to run the MCMC and readnodeomega
to read the trace.
Of note, this program runs faster than nodemutsel
and scale better with a larger number of species, because is does not estimate site-specific fitness profiles.
The results are not interpretable directly as changes in Ne, but instead in the change of the global strength of selection (ω). ω is related to the underlying genetic drift if the genes are effectively constrained (no adaptation).
So for examples species with large Ne will have stronger selection, so ω closer to 0.
Contrarily, species with small Ne will have weak selection, so ω closer to 1.
To run a BayesCode chain called run_nodeomega_placentalia
on example placental mammals data (from the data
folder) and plot the resulting trees with the commands:
nodeomega -a data/placentalia/plac.ali -t data/placentalia/plac.nhx --until 2000 run_nodeomega_placentalia
readnodeomega --burnin 1000 --until 2000 --newick run_nodeomega_placentalia
plot_tree.py --input run_nodeomega_placentalia
To include quantitative traits (optional), use the option --traitsfile
with the path to the file containing the traits in tsv format:
nodenodeomega -a data/placentalia/plac.ali -t data/placentalia/plac.nhx --traitsfile data/placentalia/plac.log.lht -u 2000 run_nodeomega_placentalia
The file containing the traits must have the following (tab-delimited) format:
TaxonName | Maturity | Mass | Longevity |
---|---|---|---|
Bos | 6.47 | 13.52 | 3.21 |
Rousettus | 5.55 | 4.71 | 2.91 |
Spermophilus | 5.92 | 5.60 | 2.10 |
Artibeus | NaN | 3.98 | 2.95 |
The columns are:
- TaxonName: the name of the taxon matching the name in the alignment and the tree.
- As many columns as traits, without spaces or special characters in the trait.
The values can be NaN
to indicate that the trait is not available for that taxon.
The values of life-history traits (Maturity, Mass, Longevity) are log-transformed, to see why, here is a nice explanation: Biology under the log.
A python script traits_coevol_to_mutsel.py
is available to convert the traits file from CoEvol format (in natural-space, see github.com/bayesiancook/coevo for the format definition) to the format used by BayesCode (log-transformed):
traits_coevol_to_mutsel.py --input data/placentalia/plac.lht --output data/placentalia/plac.log.lht
To include fossil calibrations (optional), use the option --fossils
with the path to the file containing the calibrations in tsv format:
nodenodeomega -a data/placentalia/plac.ali -t data/placentalia/plac.nhx --fossils data/placentalia/plac.calibs.tsv -u 2000 run_nodeomega_placentalia
The file containing the calibrations must have the following (tab-delimited) format:
NodeName | Age | LowerBound | UpperBound |
---|---|---|---|
EquusCeratTapir | 56.0 | 54 | 58.0 |
SuBoTrHiMeTuLaVi | 65.0 | 0 | 65.0 |
MusRattus | 12.0 | 12 | inf |
MegadPteroRouse | 51.5 | 43 | 60.0 |
The columns are:
-
NodeName: the name of the internal node matching the name in the input tree.
Root
can be used to designate the root of the tree. - Age: the age of the node, must be between LowerBound and UpperBound.
- LowerBound: the lower bound (youngest) for the fossil calibration (can be 0).
-
UpperBound: the upper bound (oldest) for the fossil calibration, can be
inf
to indicate that the calibration is not available for that node.
A python script calibs_coevol_to_mutsel.py
is available to convert the traits file from the CoEvol format (using the most recent ancestor between two extent taxa, see github.com/bayesiancook/coevo for the format definition) to the format used by BayesCode (requires a name for internal nodes):
calibs_coevol_to_mutsel.py --input data/placentalia/plac.calibs --tree data/placentalia/plac.calibs --output data/placentalia/plac.calibs.tsv
If the tree provided does not include name for internal nodes, the script will give them names and re-write the tree in the same file.
To obtain the annotated newick tree (ω, μ, quantitative traits if included) from the chain run_nodeomega_placentalia
, discarding the first 1000 points:
readnodeomega --burnin 1000 --until 2000 --newick run_nodeomega_placentalia
This command will generate a tree in newick extended format (.nhx) for each trait in the natural-space.
A python script plot_tree.py
is available to draw the trees (in .pdf) generated by readnodemutsel
, using the chain name (here run_nodeomega_placentalia
):
plot_tree.py --input run_nodeomega_placentalia
To obtain the relationship between traits (ω, μ, quantitative traits) from the chain run_nodeomega_placentalia
, discarding the first 1000 points:
readnodeomega --burnin 1000 --until 2000 --cov run_nodeomega_placentalia
This command will generate the file run_nodeomega_placentalia.cov
containing several matrices of relationship between traits:
-
covariances
Usually denoted Σ (or R matrix as in L. Harmon PCM book).
Measure of the joint variability between two traits.
Elements on the principal diagonal are variances (strictly positive), and off-diagonal elements are covariances. -
correlation coefficients
Equals to diag(Σ)⁻¹⁄₂ ✕ Σ ✕ diag(Σ)⁻¹⁄₂, see wikipedia for details.
Each element on the principal diagonal is the correlation of the trait with itself, which always equals 1.
Each off-diagonal element is between −1 and +1 inclusive. -
posterior probabilities of a positive coefficient
Each off-diagonal element is the probability that the correlation between the two traits is positive. -
precisions
Usually denoted Ω = Σ⁻¹, the precision matrix (also called concentration matrix) is the matrix inverse of the covariance matrix. -
partial correlation coefficients
Equals to - diag(Ω)⁻¹⁄₂ ✕ Ω ✕ diag(Ω)⁻¹⁄₂, see wiki wikipedia for details.
Partial correlation measures the degree of association between two traits, with the effect of all other trait associations removed.
Each off-diagonal element is between −1 and +1 inclusive. -
posterior probabilities of a positive partial coefficient
Each off-diagonal element is the probability that the partial correlation between the two traits is positive.
The entries of the matrices are in the order specified in the header. The posterior probabilities of a positive correlation (pp) are particularly important. A pp close to 1 means a strong statistical support for a positive correlation, and a pp close to 0 a supported negative correlation (the posterior probabilities for a negative correlation are given by 1 − pp).
The options for nodeomega
are:
--fossils <string> (default: None)
File path to the fossils calibration in tsv format with columns
`NodeName`, `Age, `LowerBound` and `UpperBound`.
--traitsfile <string> (default: None)
File path to the quantitative trait (in log-space) in tsv format. The
First column is `TaxonName` (taxon matching the name in the alignment)
and the next columns are traits.
You can use one of these options with readnodeomega
to compute a specific posterior statistic:
--newick (default: false)
Computes the mean posterior node-specific entries of the multivariate
Brownian process. Each entry of the multivariate Brownian process is
written in a newick extended (.nhx) format file.
--cov (default: false)
Computes the mean posterior covariance matrix.