Skip to content

Latest commit

 

History

History
80 lines (61 loc) · 3.36 KB

README.md

File metadata and controls

80 lines (61 loc) · 3.36 KB

MERLIN

Overview of the MERLIN (Mitocondrial EvolutionaRy Lineage INference) algorithm

paper: https://academic.oup.com/bioinformatics/article/40/Supplement_1/i218/7700844

Contents

  1. Pre-requisites
  2. Usage instcructions

Pre-requisites

Usage instructions

Simulation

The input for MERLIN are CSV files containing the total read counts, and the variant read counts. Both matrices should have mutations as the rows and cells as columns.

It is important that the format matches the example input files total_matrix.csv and variant_matrix.csv given in data/example, which can be generated by the following command.

mkdir data/example/
python src/simulation.py -n 50 -m 5 -g 5 -c 50 -o data/example/

Simulation usage

usage: simulation.py -m n_mutation -n n_cells -g n_clones -c coverage [-t threshold] -o O 

optional arguments:
  -m, --help      show this help message and exit
  -n, --total     csv file with total read count matrix
  -g, --variant   csv file with variant read count matrix
  -c, --coverage  expected sequencing coverage for simulated data
  -t, --threshold minimum variant allele frequency (default 0.05)
  -o, --out       output directory

Simulation Output

  • variant matrix.txt / total_matrix.txt : input to MERLIN
  • tree.txt : groundtruth clone tree
  • cell_tree.txt : groundtruth cell lineage tree
  • cell_to_clone_mapping.txt
  • mutation_to_clone_mapping.txt

MERLIN script

usage: merlin.py [-h] [-t T] [-v V] -o O 

optional arguments:
  -h, --help      show this help message and exit
  -t, --total     csv file with total read count matrix
  -v, --variant   csv file with variant read count matrix
  -o, --out       output prefix

An example of usage is as follows.

$ python src/merlin.py -t data/example/total_matrix.csv -v data/example/variant_matrix.csv -o data/example/

Output

MERLIN produces the below files as output:

  • The inferred clone tree $S$, as {output_prefix}_clone_tree_edge_list.txt
  • The $U$-matrix from the factorization $F=UB$, as {output_prefix}_Umatrix.csv
  • The binarized $U$-matrix as {output_prefix}_Amatrix.csv
  • The ancestral graph $G$ inferred from frequency matrix $F$ as {output_prefix}_ancestry_edge_list.txt

An example output for the example input above can be found in data/example

Comments

We recommend using the following pipeline described in MQuad to select informative mitochondrial variants. Note that MERLIN has a reasonable run time (< 3 hours) for $m\leq 30$ mutations. In certain cases, users may need to perform additional clustering / filtering on the mitochondrial SNPs.