Gene Trees: MrBayes

We want to analyze each of the 30 loci with MrBayes. First, make sure you have MrBayes installed:

$ which mb
/Users/ane/bin/mb

Next, choose settings for MrBayes: model, prior for branch lengths etc. Save them in a MrBayes block. Below: HKY model, 100,000 generations 2 chains (1 cold & 1 heated), 2 independent runs. These settings were chosen to run things fast during this tutorial, but for a real data set different setting should be chosen (such as 1 million generations, 3 chains and 3 runs).

$ cat ../scripts/mbblock.txt
begin mrbayes;
set nowarnings=yes;
set autoclose=yes;
lset nst=2;
mcmcp ngen=100000 burninfrac=.25 samplefreq=50 printfreq=10000 [increase these for real]
diagnfreq=10000 nruns=2 nchains=2 temp=0.40 swapfreq=10;       [increase for real analysis]
mcmc;
sumt;
end;

We are ready to analyze all loci with MrBayes:

$ ../scripts/mb.pl input/1_seqgen.tar.gz -m ../scripts/mbblock.txt -o mb-output

Script was called as follows:
perl mb.pl input/1_seqgen.tar.gz -m ../scripts/mbblock.txt -o mb-output

Appending MrBayes block to each gene... done.

Job server successfully created.

  Analyses complete: 300/300.
  All connections closed.
Total execution time: 1 hour, 9 minutes, 43 seconds.

If a cluster is available with different machines, analyses can be parallelized across machines (not just across nodes of the same machine) by adding an option --machine-file hosts.txt, where hosts.txt is a simple text file listing the machines available to use, in the format user_name@machine_address. This file might look like this:

The script created a new directory named mb-output (like we asked above), which contains a compressed tarball of all MrBayes output: mb-output/1_seqgen.mb.tar

$ ls
input	mb-output

$ ls mb-output/
1_seqgen.mb.tar	1_seqgen.tar.gz

$ tar -ztf mb-output/1_seqgen.mb.tar
1_seqgen10.nex.tar.gz
1_seqgen11.nex.tar.gz
1_seqgen12.nex.tar.gz
1_seqgen13.nex.tar.gz
...
1_seqgen6.nex.tar.gz
1_seqgen7.nex.tar.gz
1_seqgen8.nex.tar.gz
1_seqgen9.nex.tar.gz

Decompressing and looking into the result file for the first locus, we find a bunch of output including the log from MrBayes (useful to track down bugs, if any) and the sample of trees from each run (*.t), which will serve as input for BUCKy.

$ ls mb-output/1_seqgen.mb/1_seqgen1.nex
1_seqgen1.nex.ckp      1_seqgen1.nex.mcmc    1_seqgen1.nex.run2.t   1_seqgen1.nex.vstat
1_seqgen1.nex.ckp~     1_seqgen1.nex.parts   1_seqgen1.nex.run2.p
1_seqgen1.nex.con.tre  1_seqgen1.nex.run1.p  1_seqgen1.nex.trprobs
1_seqgen1.nex.log      1_seqgen1.nex.run1.t  1_seqgen1.nex.tstat

Next: combining gene trees samples to get concordance factors with BUCKy.

external links:

PhyloNetworks Workshop

home
example data
TICR pipeline: from sequences to quartet CFs
- the data
- MrBayes on all genes
- BUCKy
- Quartet MaxCut
- RAxML & ASTRAL
PhyloNetworks: from quartet CFs or gene trees to phylogenetic networks
TICR test: is a population tree with ILS sufficient (vs network)?
Continuous trait evolution on a network

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gene Trees: MrBayes

Clone this wiki locally