-
Notifications
You must be signed in to change notification settings - Fork 2
4. mutselomega
If you want to use the Mutation-Selection framework for detecting site-specific adaptive evolution, use the program mutselomega
to run the MCMC and readmutselomega
to read the trace.
Mutation-selection codon models are obtained by running mutselomega
for 2000 points of MCMC with the options:
mutselomega --ncat 30 -a data/bglobin/bglobin.phy -t data/bglobin/bglobin.tre --until 2000 run_mutsel_bglobin
Posterior mutation matrix (μ)
The gene-specific mutation matrix (μ) is obtained by running readmutselomega
, reading 1000 points of MCMC (first 1000 are considered as burn-in) with the options:
readmutselomega --every 1 --until 2000 --burnin 1000 --nuc run_mutsel_bglobin
Posterior site-specific rate of evolution (ω0)
The site-specific predicted rate of evolution (ω0(i): a scalar between 0 and 1 for each site) are obtained by running readmutselomega
, reading 1000 points of MCMC (first 1000 are considered as burn-in) and written in the file run_mutsel_bglobin.ci0.025.tsv
with the options:
readmutselomega --every 1 --until 2000 --burnin 1000 --confidence_interval 0.025 --omega_0 run_mutsel_bglobin
This will output the mean posterior ω0(i) over the MCMC as well as the 95% (1 - 0.025*2) credible interval. The value of confidence_interval used as input is considered for each side of the distribution, hence a factor 2 for the credible interval.
Posterior scaled selection coefficient for a codon transition (S0)
The collection of site-specific fitness profiles (F(i): 20x1 vector for each site) are then obtained by running readmutselomega
, reading 1000 points of MCMC (first 1000 are considered as burn-in) and written in the file run_mutsel_bglobin.siteprofiles
with the options:
readmutselomega --every 1 --until 2000 --burnin 1000 --ss run_mutsel_bglobin
A python script fitness_to_selcoeff.py
is available to compute scaled selection coefficients (S0) from a list of codon transitions.
The script takes as input a collection of site-specific fitness profiles for a gene generated by BayesCode, and a list of positions with the associated codon transitions, it will output the list of selection coefficients for each codon transition.
fitness_to_selcoeff.py --input_transitions data/bglobin/bglobin.transitions.tsv --input_profiles run_mutsel_bglobin.siteprofiles --output bglobin.transitions.selcoeffs.tsv
Help for the script is also available:
fitness_to_selcoeff.py --help
Classical codon models (multiplicative factor ω for non-synonymous substitutions) are obtained by running mutselomega
for 2000 points of MCMC with the options:
mutselomega --freeomega --omegancat 30 --flatfitness -a data/bglobin/bglobin.phy -t data/bglobin/bglobin.tre -u 2000 run_classical_bglobin
Posterior site-specific rate of evolution (ω)
The site-specific predicted rate of evolution (ω(i): a positif scalar) is obtained by running readmutselomega
, reading 1000 points of MCMC (first 1000 are considered as burn-in) and written in the file run_classical_bglobin.ci0.025.tsv
with the options:
readmutselomega --every 1 --until 2000 --burnin 1000 --confidence_interval 0.025 --omega run_classical_bglobin
This will output the mean posterior ω(i) over the MCMC as well as the 95% (1 - 0.025*2) credible interval.
The value of confidence_interval
used as input is considered for each side of the distribution, hence a factor 2 for the credible interval.
Mutation-selection codon models with a ω∗ multiplicative factor (MutSel-M3, see doi.org/10.1093/molbev/msaa265) are obtained by running mutselomega
for 2000 points of MCMC with the options:
mutselomega --freeomega --omegancat 3 --ncat 30 -a data/bglobin/bglobin.phy -t data/bglobin/bglobin.tre -u 2000 run_mutselM3_bglobin
Posterior site-specific rate of evolution (ω∗)
The site-specific posterior probabilities in favor of a value greater than 1 (p(ω∗ > 1)) are obtained by running readmutselomega
, reading 1000 points of MCMC (first 1000 are considered as burn-in) and written in the file run_classical_bglobin.omegappgt1.0
as a .tsv file with the options:
readmutselomega --every 1 --until 2000 --burnin 1000 --omega_threshold 1.0 run_mutselM3_bglobin
The options for mutselomega
are:
--ncat <int> (default: 30)
Number of components for the amino-acid fitness profiles (truncation
of the stick-breaking process).
--profiles <string> (default: None)
File path the fitness profiles (tsv or csv), thus considered fixed.
Each line must contains the fitness of each of the 20 amino-acid, thus
summing to one. If same number of profiles as the codon alignment,
site allocations are considered fixed. If smaller than the alignment
size, site allocations are computed and `ncat` is given by the number
of profiles in the file.
--flatfitness (default: false)
Fitness profiles are flattened (and `ncat` equals to 1). This option
is not compatible with the option `profiles`.
--freeomega (default: false)
ω is allowed to vary (default ω is 1.0). Combined with the
option `flatfitness`, we obtain the classical, ω-based codon model
(Muse & Gaut). Without the option `flatfitness`, we obtain the
mutation-selection codon model with a multiplicative factor (ω⁎).
--omegancat <int> (default: 1)
Number of components for ω (finite mixture).
--omegaarray <string> (default: None)
File path to ω values (one ω per line), thus considered
fixed. `freeomega` is overridden to false and `omegancat` equals to the
number of ω in the file.
--omegashift <double> (default: 0.0)
Additive shift applied to all ω (0.0 for the general case).
You can use one of these options with readmutselomega
to compute a specific posterior statistic:
--omega_threshold <string>
Threshold to compute the mean posterior probability that ω⁎ (or ω
if option `flatfitness` is used in `mutselomega`) is greater than a
given value (1.0 to test for adaptation). Results are written in
{chain_name}.omegappgt{omega_pp}.tsv by default (optionally use the
--output argument to specify a different output path).
-s, --ss
Computes the mean posterior site-specific amino-acid equilibrium
frequencies(amino-acid fitness profiles). Results are written in
{chain_name}.siteprofiles by default (optionally use the --output
argument to specify a different output path).
-c <string>, --confidence_interval <string>
Boundary for posterior credible interval of ω and ω₀ (per site and
at the gene level). Default value is 0.025 at each side, meaning
computing the 1-2*0.025=95% CI.
--omega_0
Compute posterior credible interval for ω₀ predicted at the
mutation-selection equilibrium from the fitness profiles, for each
site and at the gene level. Can be combined with the option
`confidence_interval` to change the default value (0.025 at each side
of the distribution). Results are written in
{chain_name}.ci{confidence_interval}.tsv by default (optionally use
the --output argument to specify a different output path).
--omega
Compute posterior credible interval for ω for each site and at the
gene level. Can be combined with the option `confidence_interval` to
change the default value (0.025 at each side of the distribution).
Results are written in {chain_name}.ci{confidence_interval}.tsv by
default (optionally use the --output argument to specify a different
output path).
--chain_omega <string>
A second chain ran with the option --freeomega and --flatfitness to
obtain the classical ω-based codon model (Muse & Gaut). These two
chains allow to compute posterior of ω, ω₀, ωᴬ=ω-ω₀ and
p(ωᴬ>0) for each site and at the gene level. Results are written in
{chain_name}.omegaA.tsv by default (optionally use the --output
argument to specify a different output path).
-n, --nuc
Mean posterior 4x4 nucleotide matrix.Results are written in
{chain_name}.nucmatrix.tsv by default (optionally use the --output
argument to specify a different output path).