First: Clone this repository
git clone https://github.com/SViswanathanLab/PhylogicNDT
cd PhylogicNDT
Then either :
Install docker from https://www.docker.com/community-edition#/download
docker build --tag phylogicndt .
chmod +x PhylogicNDT.py
./PhylogicNDT.py --help
If running from the docker, first run:
docker run -it -v /path/to/PhylogicNDT:/phylogicndt phylogicndt
cd phylogicndt
This docker run is interactive and the volume is mounted to your local path to PhylogicNDT repository and the phylogicndt directory inside the docker container.
To run clustering on the provided sample input data:
To specify inputs:
./PhylogicNDT.py Cluster -i Patient_ID -s Sample1_id:Sample1_maf:Sample1_CN_seg:Sample1_Purity:Sample1_Timepoint -s Sample2_id:Sample2_maf:Sample2_CN_seg:Sample2_Purity:Sample2_Timepoint ... SampleN_info
alternatively - provide a tsv sample_information_file (.sif)
with headers: sample_id maf_fn seg_fn purity timepoint
./PhylogicNDT.py Cluster -i Patient_ID -sif Patient.sif
the .maf should contain pre-computed raw ccf histograms based on mutations alt/ref count
(Absolute annotated mafs or .Rdata files are also supported)
if the ccf histograms are absent - the --maf_input_type
flag must be set to
calc_ccf
and sample purity must be provided. Also local copy number must
be attached to each mutation in the maf with columns named local_cn_a1
and local_cn_a2
CN_seg is optional to annotate copy-number information on the trees
To specify number of iterations:
./PhylogicNDT.py Cluster -ni 1000
Acknowledgment: Clustering Module is partially inspired (primary 1D clustering) by earlier work of Carter & Getz (Landau D, Carter S , Stojanov P et al. Cell 152, 714–726, 2013)
The GrowthKinetics module fully incorporates the BuildTree libraries, so when rates are desired, there is no need to run both.
- The -w flag should provide a measure of tumor burden, with one value per input sample maf in clustering. When ommited, stable tumor burden is assumed.
- The -t flag should provide relative time for spacing the samples. When omitted, equal spacing is assumed.
Just BuildTree
./PhylogicNDT.py BuildTree -i Indiv_ID -sif Patient.sif -m mutation_ccf_file -c cluster_ccf_file
GrowthKinetics
./PhylogicNDT.py GrowthKinetics -i Indiv_ID -sif Patient.sif -ab cell_population_abundance_mcmc_trace -w 10 10 10 10 10 -t 1 2 3 4 5
Run Cluster together with BuildTree
./PhylogicNDT.py Cluster -i Patient_ID -sif Patient.sif -rb
SinglePatientTiming requires a maf input and a seg file input for each sample. The maf file should be the output of PhylogicNDT Clustering module. The seg file should have the following columns:
Chromosome Start End A1.Seg.CN A2.Seg.CN
To run SinglePatientTiming:
./PhylogicNDT.py Timing -i Indiv_ID -sif Patient.sif
LeagueModel requires an input of comparison tables. The comparison tables should be the output of SinglePatientTiming ending in ".comp.tsv"
To run LeagueModel:
./PhylogicNDT.py LeagueModel -cohort Cohort -comps comp1 comp2 ... compN
Alternatively, one can use a single aggregated table. The table should have the following columns:
sample event1 event2 p_event1_win p_event2_win unknown
To run with the aggregated table:
./PhylogicNDT.py LeagueModel -cohort Cohort -comparison_cn comps
A simulation module is provided for convenience.
./PhylogicNDT.py PhylogicSim --help
Command to visualize all the options and help.
./PhylogicNDT.py PhylogicSim
Run the simulation with the default paramters.
./PhylogicNDT.py PhylogicSim -i MySimulation
Specify a prefix for all the output files
./PhylogicNDT.py PhylogicSim -i MySimulation -ns 7
Specify the number of samples you want to simulate.
./PhylogicNDT.py PhylogicSim -i MySimulation -nodes 5
Specify the number of distinct clones present in your samples. Minimum 2 (The first clone is always the clonal clone)
./PhylogicNDT.py PhylogicSim -i MySimulation -nodes 5 -seg /Example_SegFile.txt
Specify a segment file with copy number values to sample from. See the "Example_SegFile.txt" for a format example. If no file is specified, a build-in CN profile is used, based on the hg19 contigs.
./PhylogicNDT.py PhylogicSim -i MySimulation -nodes 5 -clust_file /Example_Clust_File.txt
Force the ccf values of each cluster on each sample, instead of generating a new random phylogeny from scratch. If -clust_file is specified, the -ns and -nodes flags are ignored an instead replaced with the values from the Clust_File. Each line of the tsv file represents a sample, with each tab separated value the ccf of a cluster. The last value of each line must always be -1 to account for the artifact cluster.
./PhylogicNDT.py PhylogicSim -i MySimulation -nodes 5 -clust_file /Example_Clust_File.txt -a 0.3
Specify the proportion of mutations that are artifactual (Random af unrelated to mutation/CN). Can be combined with a clust_file.
./PhylogicNDT.py PhylogicSim -i MySimulation -nodes 5 -clust_file /Example_Clust_File.txt -pfile /Example_PurityFile.txt
TSV file to specify the purity of each sample individualy (Otherwise, the purity is specified for all the samples using the -p flag.). Each line represents a sample. The file can optionally contain an extra three columns with the alpha, beta and N values for the coverage betabinomial for each sample (Otherwise, those values are set for all samples using the -ap, -b and -nb flags respectively).