Skip to content

6. nodeomega

T. Latrille edited this page Jan 16, 2025 · 4 revisions

If you want to use Bayescode to infer changes in ω and μ along the phylogeny, use the program nodeomega to run the MCMC and readnodeomega to read the trace.

Of note, this program runs faster than nodemutsel and scale better with a larger number of species, because is does not estimate site-specific fitness profiles. The results are not interpretable directly as changes in Ne, but instead in the change of the global strength of selection (ω). ω is related to the underlying genetic drift if the genes are effectively constrained (no adaptation). So for examples species with large Ne will have stronger selection, so ω closer to 0. Contrarily, species with small Ne will have weak selection, so ω closer to 1.

To run a BayesCode chain called run_nodeomega_placentalia on example placental mammals data (from the data folder) and plot the resulting trees with the commands:

nodeomega -a data/placentalia/plac.ali -t data/placentalia/plac.nhx --until 2000 run_nodeomega_placentalia
readnodeomega --burnin 1000 --until 2000 --newick run_nodeomega_placentalia
plot_tree.py --input run_nodeomega_placentalia

I. Including quantitative traits (optional)

To include quantitative traits (optional), use the option --traitsfile with the path to the file containing the traits in tsv format:

nodenodeomega -a data/placentalia/plac.ali -t data/placentalia/plac.nhx --traitsfile data/placentalia/plac.log.lht -u 2000 run_nodeomega_placentalia

The file containing the traits must have the following (tab-delimited) format:

TaxonName Maturity Mass Longevity
Bos 6.47 13.52 3.21
Rousettus 5.55 4.71 2.91
Spermophilus 5.92 5.60 2.10
Artibeus NaN 3.98 2.95

The columns are:

  • TaxonName: the name of the taxon matching the name in the alignment and the tree.
  • As many columns as traits, without spaces or special characters in the trait.

The values can be NaN to indicate that the trait is not available for that taxon.
The values of life-history traits (Maturity, Mass, Longevity) are log-transformed, to see why, here is a nice explanation: Biology under the log.

A python script traits_coevol_to_mutsel.py is available to convert the traits file from CoEvol format (in natural-space, see github.com/bayesiancook/coevo for the format definition) to the format used by BayesCode (log-transformed):

traits_coevol_to_mutsel.py --input data/placentalia/plac.lht --output data/placentalia/plac.log.lht

II. Including fossil calibration (optional)

To include fossil calibrations (optional), use the option --fossils with the path to the file containing the calibrations in tsv format:

nodenodeomega -a data/placentalia/plac.ali -t data/placentalia/plac.nhx --fossils data/placentalia/plac.calibs.tsv -u 2000 run_nodeomega_placentalia

The file containing the calibrations must have the following (tab-delimited) format:

NodeName Age LowerBound UpperBound
EquusCeratTapir 56.0 54 58.0
SuBoTrHiMeTuLaVi 65.0 0 65.0
MusRattus 12.0 12 inf
MegadPteroRouse 51.5 43 60.0

The columns are:

  • NodeName: the name of the internal node matching the name in the input tree.
    Root can be used to designate the root of the tree.
  • Age: the age of the node, must be between LowerBound and UpperBound.
  • LowerBound: the lower bound (youngest) for the fossil calibration (can be 0).
  • UpperBound: the upper bound (oldest) for the fossil calibration, can be inf to indicate that the calibration is not available for that node.

A python script calibs_coevol_to_mutsel.py is available to convert the traits file from the CoEvol format (using the most recent ancestor between two extent taxa, see github.com/bayesiancook/coevo for the format definition) to the format used by BayesCode (requires a name for internal nodes):

calibs_coevol_to_mutsel.py --input data/placentalia/plac.calibs --tree data/placentalia/plac.calibs --output data/placentalia/plac.calibs.tsv

If the tree provided does not include name for internal nodes, the script will give them names and re-write the tree in the same file.

III. Read annotated trees

To obtain the annotated newick tree (ω, μ, quantitative traits if included) from the chain run_nodeomega_placentalia, discarding the first 1000 points:

readnodeomega --burnin 1000 --until 2000 --newick run_nodeomega_placentalia

This command will generate a tree in newick extended format (.nhx) for each trait in the natural-space.

A python script plot_tree.py is available to draw the trees (in .pdf) generated by readnodemutsel, using the chain name (here run_nodeomega_placentalia):

plot_tree.py --input run_nodeomega_placentalia

IV. Read covariance matrix

To obtain the relationship between traits (ω, μ, quantitative traits) from the chain run_nodeomega_placentalia, discarding the first 1000 points:

readnodeomega --burnin 1000 --until 2000 --cov run_nodeomega_placentalia

This command will generate the file run_nodeomega_placentalia.cov containing several matrices of relationship between traits:

  • covariances
    Usually denoted Σ (or R matrix as in L. Harmon PCM book).
    Measure of the joint variability between two traits.
    Elements on the principal diagonal are variances (strictly positive), and off-diagonal elements are covariances.
  • correlation coefficients
    Equals to diag(Σ)⁻¹⁄₂ ✕ Σ ✕ diag(Σ)⁻¹⁄₂, see wikipedia for details.
    Each element on the principal diagonal is the correlation of the trait with itself, which always equals 1.
    Each off-diagonal element is between −1 and +1 inclusive.
  • posterior probabilities of a positive coefficient
    Each off-diagonal element is the probability that the correlation between the two traits is positive.
  • precisions
    Usually denoted Ω = Σ⁻¹, the precision matrix (also called concentration matrix) is the matrix inverse of the covariance matrix.
  • partial correlation coefficients
    Equals to - diag(Ω)⁻¹⁄₂ ✕ Ω ✕ diag(Ω)⁻¹⁄₂, see wiki wikipedia for details.
    Partial correlation measures the degree of association between two traits, with the effect of all other trait associations removed.
    Each off-diagonal element is between −1 and +1 inclusive.
  • posterior probabilities of a positive partial coefficient
    Each off-diagonal element is the probability that the partial correlation between the two traits is positive.

The entries of the matrices are in the order specified in the header. The posterior probabilities of a positive correlation (pp) are particularly important. A pp close to 1 means a strong statistical support for a positive correlation, and a pp close to 0 a supported negative correlation (the posterior probabilities for a negative correlation are given by 1 − pp).

V. Options for nodeomega and readnodeomega

The options for nodeomega are:

--fossils <string> (default: None)
 File path to the fossils calibration in tsv format with columns
 `NodeName`, `Age, `LowerBound` and `UpperBound`.

--traitsfile <string> (default: None)
 File path to the quantitative trait (in log-space) in tsv format. The
 First column is `TaxonName` (taxon matching the name in the alignment)
 and the next columns are traits.

You can use one of these options with readnodeomega to compute a specific posterior statistic:

--newick (default: false)
 Computes the mean posterior node-specific entries of the multivariate
 Brownian process. Each entry of the multivariate Brownian process is
 written in a newick extended (.nhx) format file.

--cov (default: false)
 Computes the mean posterior covariance matrix.