-
Notifications
You must be signed in to change notification settings - Fork 49
Gene Trees: MrBayes
We want to analyze each of the 30 loci with MrBayes. First, make sure you have MrBayes installed. On the MBL cluster, MbBayes is available via the "phylonetworks" module, so load it up and check:
$ module load phylonetworks
$ which mb
/class/molevol-software/mrbayes-3.2.6/src/mb
Next, choose settings for MrBayes: model, prior for branch lengths etc. Save them in a MrBayes block. Below: HKY model, 100,000 generations 2 chains (1 cold & 1 heated), 2 independent runs. These settings were chosen to run things fast during this tutorial, but for a real data set different setting should be chosen (such as 1 million generations, 3 chains and 3 runs).
$ cat ../scripts/mbblock.txt
begin mrbayes;
set nowarnings=yes;
set autoclose=yes;
lset nst=2;
mcmcp ngen=100000 burninfrac=.25 samplefreq=50 printfreq=10000 [increase these for real]
diagnfreq=10000 nruns=2 nchains=2 temp=0.40 swapfreq=10; [increase for real analysis]
mcmc;
sumt;
end;
We are ready to analyze all loci with MrBayes:
$ ../scripts/mb.pl input/1_seqgen.tar.gz -m ../scripts/mbblock.txt -o mb-output
Script was called as follows:
perl mb.pl input/1_seqgen.tar.gz -m ../scripts/mbblock.txt -o mb-output
Appending MrBayes block to each gene... done.
Job server successfully created.
Analyses complete: 30/30.
All connections closed.
Total execution time: 46 seconds.
If a cluster is available with different machines, analyses can be parallelized
across machines (not just across nodes of the same machine) by adding an option
--machine-file hosts.txt
, where hosts.txt
is a simple text
file listing the machines available to use, in the format user_name@machine_address
.
This file might look like this:
The script created a new directory named mb-output
(like we asked above),
which contains a compressed tarball of all MrBayes output: mb-output/1_seqgen.mb.tar
$ ls
input mb-output
$ ls mb-output/
1_seqgen.mb.tar 1_seqgen.tar.gz
$ tar -tf mb-output/1_seqgen.mb.tar
1_seqgen12.nex.tar.gz
1_seqgen11.nex.tar.gz
1_seqgen10.nex.tar.gz
...
1_seqgen7.nex.tar.gz
1_seqgen8.nex.tar.gz
1_seqgen9.nex.tar.gz
Let's look at the result file for the first locus. For this, let's go into
the new mb-output
folder (we will need to go back to the main folder later),
create a folder 1_seqgen.mb
for exanding all the MrBayes results,
decompress the results for the first locus, and finally look at them:
cd mb-output
mkdir 1_seqgen.mb
tar -xvf 1_seqgen.mb.tar -C 1_seqgen.mb
ls 1_seqgen.mb
cd 1_seqgen.mb
mkdir 1_seqgen1.nex
tar -xzvf 1_seqgen1.nex.tar.gz -C 1_seqgen1.nex
ls 1_seqgen1.nex
We find a bunch of output including the log from MrBayes
(useful to track down bugs, if any) and the sample of
trees from each run (*.t
), which will serve as input for BUCKy.
$ ls 1_seqgen1.nex
1_seqgen1.nex.ckp 1_seqgen1.nex.mcmc 1_seqgen1.nex.run2.t 1_seqgen1.nex.vstat
1_seqgen1.nex.ckp~ 1_seqgen1.nex.parts 1_seqgen1.nex.run2.p
1_seqgen1.nex.con.tre 1_seqgen1.nex.run1.p 1_seqgen1.nex.trprobs
1_seqgen1.nex.log 1_seqgen1.nex.run1.t 1_seqgen1.nex.tstat
The most important files are those containing the sampled gene trees:
$ less -S 1_seqgen1.nex/1_seqgen1.nex.run1.t
#NEXUS
[ID: 5353870756]
[Param: tree]
begin trees;
translate
1 6,
2 5,
3 1,
4 2,
5 3,
6 4;
tree gen.0 = [&U] ((4:2.000000e-02,(6:2.000000e-02,2:2.000000e-02):2.000000e-02):2.000000e-02,(5:2.000000e-02,3:2.000000e-02):2.000000e-02,1:2.000000e-02);
tree gen.50 = [&U] (((4:6.749905e-03,3:1.396825e-02):1.069614e-02,(5:2.164948e-02,6:7.205712e-03):1.861635e-02):3.204673e-02,2:2.699199e-02,1:3.668708e-02);
tree gen.100 = [&U] (((4:5.960128e-03,3:6.373705e-03):1.170556e-02,(5:1.603174e-02,6:6.627496e-03):8.926058e-03):2.721952e-02,2:2.638575e-02,1:6.115106e-02);
tree gen.150 = [&U] (((4:9.868770e-03,3:2.805685e-03):1.086845e-02,(5:2.218544e-02,6:6.324761e-03):9.347534e-03):1.152684e-02,2:5.165054e-02,1:6.528826e-02);
tree gen.200 = [&U] (((4:1.098244e-02,3:2.853569e-03):1.351000e-02,(5:1.345820e-02,6:1.902973e-02):1.005224e-02):1.344231e-02,2:4.246484e-02,1:8.722215e-02);
...
tree gen.99850 = [&U] (((4:2.183261e-02,3:4.841606e-03):4.529830e-03,(5:1.139391e-02,6:1.407128e-02):1.184515e-02):3.180385e-02,2:2.979942e-02,1:5.423444e-02)
tree gen.99900 = [&U] (((6:1.200251e-02,5:1.781890e-02):1.025264e-02,(4:1.707598e-02,3:2.540760e-04):1.734718e-02):2.143296e-02,2:2.929320e-02,1:7.620077e-02)
tree gen.99950 = [&U] (((4:2.988533e-03,3:3.844257e-03):1.021980e-02,(5:2.239675e-02,6:1.117266e-02):8.290780e-03):2.668343e-02,2:2.229218e-02,1:4.864512e-02)
tree gen.100000 = [&U] (((6:1.238475e-02,5:1.200429e-02):7.875171e-03,(4:5.683504e-03,3:3.564505e-03):1.675113e-02):2.143296e-02,2:2.929320e-02,1:7.620077e-02
end;
Type G
to go to the end of the file, g
to come back to the beginning,
and q
to quit viewing the file.
There are also trees from the second independent run:
less -S 1_seqgen1.nex/1_seqgen1.nex.run2.t
We won't be using it (because we will be using the full list of all sampled trees), but we do have 50% majority rule consensus tree from the posterior distribution:
$ cat 1_seqgen1.nex/1_seqgen1.nex.con.tre
#NEXUS
[ID: 5353870756]
begin taxa;
dimensions ntax=6;
taxlabels
6
5
1
2
3
4
;
end;
begin trees;
translate
1 6,
2 5,
3 1,
4 2,
5 3,
6 4
;
tree con_50_majrule = [&U] (1[&prob=1.00000000e+00,prob_stddev=0.00000000e+00,prob_range={1.00000000e+00,1.00000000e+00},prob(percent)="100",prob+-sd="100+-0"]:7.315541e-02[&length_mean=7.35856944e-02,length_median=7.31554100e-02,length_95%HPD={5.05359800e-02,9.70821400e-02}],2[&prob=1.00000000e+00,prob_stddev=0.00000000e+00,prob_range={1.00000000e+00,1.00000000e+00},prob(percent)="100",prob+-sd="100+-0"]:3.569409e-02[&length_mean=3.66608383e-02,length_median=3.56940900e-02,length_95%HPD={2.11883100e-02,5.53439400e-02}],((3[&prob=1.00000000e+00,prob_stddev=0.00000000e+00,prob_range={1.00000000e+00,1.00000000e+00},prob(percent)="100",prob+-sd="100+-0"]:3.064436e-03[&length_mean=3.66163042e-03,length_median=3.06443600e-03,length_95%HPD={8.10284600e-06,8.60207700e-03}],4[&prob=1.00000000e+00,prob_stddev=0.00000000e+00,prob_range={1.00000000e+00,1.00000000e+00},prob(percent)="100",prob+-sd="100+-0"]:8.642618e-03[&length_mean=9.32276055e-03,length_median=8.64261800e-03,length_95%HPD={2.03317100e-03,1.75520400e-02}])[&prob=1.00000000e+00,prob_stddev=0.00000000e+00,prob_range={1.00000000e+00,1.00000000e+00},prob(percent)="100",prob+-sd="100+-0"]:1.098585e-02[&length_mean=1.16285593e-02,length_median=1.09858500e-02,length_95%HPD={3.41401000e-03,2.08909000e-02}],(5[&prob=1.00000000e+00,prob_stddev=0.00000000e+00,prob_range={1.00000000e+00,1.00000000e+00},prob(percent)="100",prob+-sd="100+-0"]:1.455524e-02[&length_mean=1.51849277e-02,length_median=1.45552400e-02,length_95%HPD={5.76476500e-03,2.61160100e-02}],6[&prob=1.00000000e+00,prob_stddev=0.00000000e+00,prob_range={1.00000000e+00,1.00000000e+00},prob(percent)="100",prob+-sd="100+-0"]:1.063229e-02[&length_mean=1.13693922e-02,length_median=1.06322900e-02,length_95%HPD={3.20096100e-03,2.05437800e-02}])[&prob=1.00000000e+00,prob_stddev=0.00000000e+00,prob_range={1.00000000e+00,1.00000000e+00},prob(percent)="100",prob+-sd="100+-0"]:1.101159e-02[&length_mean=1.16580154e-02,length_median=1.10115900e-02,length_95%HPD={3.05337700e-03,2.09244800e-02}])[&prob=1.00000000e+00,prob_stddev=0.00000000e+00,prob_range={1.00000000e+00,1.00000000e+00},prob(percent)="100",prob+-sd="100+-0"]:2.410702e-02[&length_mean=2.48581882e-02,length_median=2.41070200e-02,length_95%HPD={1.12175500e-02,3.92761300e-02}]);
end;
Before we move on to combine all these individual-locus analyses, let's navigate back to the main folder for our data:
$ cd ../..
$ pwd
/class/cane/pnwiki/data_results/baseline.gamma0.3_n30
Next: combining gene trees samples to get concordance factors with BUCKy.
PhyloNetworks Workshop
- home
- example data
-
TICR pipeline:
from sequences to quartet CFs
- the data
- MrBayes on all genes
- BUCKy
- Quartet MaxCut
- RAxML & ASTRAL
- PhyloNetworks: from quartet CFs or gene trees to phylogenetic networks
- TICR test: is a population tree with ILS sufficient (vs network)?
- Continuous trait evolution on a network