Skip to content

4. mutselomega

T. Latrille edited this page Oct 4, 2023 · 1 revision

If you want to use the Mutation-Selection framework for detecting site-specific adaptive evolution, use the program mutselomega to run the MCMC and readmutselomega to read the trace.

I. Mutation-selection codon models (ω0 and S0)

Mutation-selection codon models are obtained by running mutselomega for 2000 points of MCMC with the options:

mutselomega --ncat 30 -a data/bglobin/bglobin.phy -t data/bglobin/bglobin.tre --until 2000 run_mutsel_bglobin

Posterior mutation matrix (μ)
The gene-specific mutation matrix (μ) is obtained by running readmutselomega, reading 1000 points of MCMC (first 1000 are considered as burn-in) with the options:

readmutselomega --every 1 --until 2000 --burnin 1000 --nuc run_mutsel_bglobin

Posterior site-specific rate of evolution (ω0)

⚠️ Please use BayesCode v1.3.0 or more recent to compute ω₀ ⚠️

The site-specific predicted rate of evolution (ω0(i): a scalar between 0 and 1 for each site) are obtained by running readmutselomega, reading 1000 points of MCMC (first 1000 are considered as burn-in) and written in the file run_mutsel_bglobin.ci0.025.tsv with the options:

readmutselomega --every 1 --until 2000 --burnin 1000 --confidence_interval 0.025 --omega_0 run_mutsel_bglobin

This will output the mean posterior ω0(i) over the MCMC as well as the 95% (1 - 0.025*2) credible interval. The value of confidence_interval used as input is considered for each side of the distribution, hence a factor 2 for the credible interval.

Posterior scaled selection coefficient for a codon transition (S0)
The collection of site-specific fitness profiles (F(i): 20x1 vector for each site) are then obtained by running readmutselomega, reading 1000 points of MCMC (first 1000 are considered as burn-in) and written in the file run_mutsel_bglobin.siteprofiles with the options:

readmutselomega --every 1 --until 2000 --burnin 1000 --ss run_mutsel_bglobin

A python script fitness_to_selcoeff.py is available to compute scaled selection coefficients (S0) from a list of codon transitions.
The script takes as input a collection of site-specific fitness profiles for a gene generated by BayesCode, and a list of positions with the associated codon transitions, it will output the list of selection coefficients for each codon transition.

fitness_to_selcoeff.py --input_transitions data/bglobin/bglobin.transitions.tsv --input_profiles run_mutsel_bglobin.siteprofiles --output bglobin.transitions.selcoeffs.tsv

Help for the script is also available:

fitness_to_selcoeff.py --help

II. Classical codon models (ω)

Classical codon models (multiplicative factor ω for non-synonymous substitutions) are obtained by running mutselomega for 2000 points of MCMC with the options:

mutselomega --freeomega --omegancat 30 --flatfitness -a data/bglobin/bglobin.phy -t data/bglobin/bglobin.tre -u 2000 run_classical_bglobin

Posterior site-specific rate of evolution (ω)
The site-specific predicted rate of evolution (ω(i): a positif scalar) is obtained by running readmutselomega, reading 1000 points of MCMC (first 1000 are considered as burn-in) and written in the file run_classical_bglobin.ci0.025.tsv with the options:

readmutselomega --every 1 --until 2000 --burnin 1000 --confidence_interval 0.025 --omega run_classical_bglobin

This will output the mean posterior ω(i) over the MCMC as well as the 95% (1 - 0.025*2) credible interval. The value of confidence_interval used as input is considered for each side of the distribution, hence a factor 2 for the credible interval.

III. Mutation-selection codon models with a multiplicative factor (ω)

Mutation-selection codon models with a ω multiplicative factor (MutSel-M3, see doi.org/10.1093/molbev/msaa265) are obtained by running mutselomega for 2000 points of MCMC with the options:

mutselomega --freeomega --omegancat 3 --ncat 30 -a data/bglobin/bglobin.phy -t data/bglobin/bglobin.tre -u 2000 run_mutselM3_bglobin

Posterior site-specific rate of evolution (ω)
The site-specific posterior probabilities in favor of a value greater than 1 (p(ω > 1)) are obtained by running readmutselomega, reading 1000 points of MCMC (first 1000 are considered as burn-in) and written in the file run_classical_bglobin.omegappgt1.0 as a .tsv file with the options:

readmutselomega --every 1 --until 2000 --burnin 1000 --omega_threshold 1.0 run_mutselM3_bglobin

IV. Options for mutselomega and readmutselomega

The options for mutselomega are:

--ncat <int> (default: 30)
 Number of components for the amino-acid fitness profiles (truncation
 of the stick-breaking process).

--profiles <string> (default: None)
 File path the fitness profiles (tsv or csv), thus considered fixed.
 Each line must contains the fitness of each of the 20 amino-acid, thus
 summing to one. If same number of profiles as the codon alignment,
 site allocations are considered fixed. If smaller than the alignment
 size, site allocations are computed and `ncat` is given by the number
 of profiles in the file.
   
--flatfitness (default: false)
 Fitness profiles are flattened (and `ncat` equals to 1). This option
 is not compatible with the option `profiles`.
    
--freeomega (default: false)
 ω is allowed to vary (default ω is 1.0). Combined with the
 option `flatfitness`, we obtain the classical, ω-based codon model
 (Muse & Gaut). Without the option `flatfitness`, we obtain the
 mutation-selection codon model with a multiplicative factor (ω⁎).

--omegancat <int> (default: 1)
 Number of components for ω (finite mixture).
      
--omegaarray <string> (default: None)
 File path to ω values (one ω per line), thus considered
 fixed. `freeomega` is overridden to false and `omegancat` equals to the
 number of ω in the file.

--omegashift <double> (default: 0.0)
 Additive shift applied to all ω (0.0 for the general case).

You can use one of these options with readmutselomega to compute a specific posterior statistic:

--omega_threshold <string>
  Threshold to compute the mean posterior probability that ω⁎ (or ω
  if option `flatfitness` is used in `mutselomega`) is greater than a
  given value (1.0 to test for adaptation). Results are written in
  {chain_name}.omegappgt{omega_pp}.tsv by default (optionally use the
  --output argument to specify a different output path).

-s,  --ss
  Computes the mean posterior site-specific amino-acid equilibrium
  frequencies(amino-acid fitness profiles). Results are written in
  {chain_name}.siteprofiles by default (optionally use the --output
  argument  to specify a different output path).

-c <string>,  --confidence_interval <string>
  Boundary for posterior credible interval of ω and ω₀ (per site and
  at the gene level). Default value is 0.025 at each side, meaning
  computing the 1-2*0.025=95% CI.

--omega_0
  Compute posterior credible interval for ω₀ predicted at the
  mutation-selection equilibrium from the fitness profiles, for each
  site and at the gene level. Can be combined with the option
  `confidence_interval` to change the default value (0.025 at each side
  of the distribution). Results are written in
  {chain_name}.ci{confidence_interval}.tsv by default (optionally use
  the --output argument to specify a different output path).

--omega
  Compute posterior credible interval for ω for each site and at the
  gene level. Can be combined with the option `confidence_interval` to
  change the default value (0.025 at each side of the distribution).
  Results are written in {chain_name}.ci{confidence_interval}.tsv by
  default (optionally use the --output argument to specify a different
  output path).

--chain_omega <string>
  A second chain ran with the option --freeomega and --flatfitness to
  obtain the classical ω-based codon model (Muse & Gaut). These two
  chains allow to compute posterior of ω, ω₀, ωᴬ=ω-ω₀ and
  p(ωᴬ>0) for each site and at the gene level. Results are written in
  {chain_name}.omegaA.tsv by default (optionally use the --output
  argument  to specify a different output path).

-n,  --nuc
  Mean posterior 4x4 nucleotide matrix.Results are written in
  {chain_name}.nucmatrix.tsv by default (optionally use the --output
  argument  to specify a different output path).