Code related to the manuscript Inferring signaling pathways with probabilistic programming (Merrell & Gitter, 2020) Bioinformatics, 36:Supplement_2, i822–i830.
This repository contains the following:
SSPS
: A method that infers relationships between variables using time series data.- Modeling assumption: the time series data is generated by a Dynamic Bayesian Network (DBN).
- Inference strategy: MCMC sampling over possible DBN structures.
- Implementation: written in Julia, using the
Gen
probabilistic programming language
- Analysis code:
- simulation studies;
- convergence analyses;
- evaluation on experimental data;
- a Snakefile for managing all of the analyses.
(If you plan to reproduce all of the analyses, then make sure you're on a host with access to plenty of CPUs. Ideally, you would have access to a cluster of some sort.)
- Clone this repository
git clone [email protected]:gitter-lab/ssps.git
- Install Julia 1.6 (and all Julia dependencies)
- Download the correct Julia binary here: https://julialang.org/downloads/.
E.g., for Linux x86_64:
$ wget https://julialang-s3.julialang.org/bin/linux/x64/1.6/julia-1.6.7-linux-x86_64.tar.gz $ tar -xvzf julia-1.6.7-linux-x86_64.tar.gz
- Find additional installation instructions here: https://julialang.org/downloads/platform/.
- Use
Pkg
-- Julia's package manager -- to install the project's julia dependencies:
$ cd ssps/SSPS $ julia --project=. _ _ _ _(_)_ | Documentation: https://docs.julialang.org (_) | (_) (_) | _ _ _| |_ __ _ | Type "?" for help, "]?" for Pkg help. | | | | | | |/ _` | | | | |_| | | | (_| | | Version 1.6.7 (2022-07-19) _/ |\__'_|_|_|\__'_| | Official https://julialang.org/ release |__/ | julia> using Pkg julia> Pkg.instantiate() julia> exit()
- Download the correct Julia binary here: https://julialang.org/downloads/.
In order to reproduce the analyses, you will need some extra bits of software.
- We use Snakemake -- a python package -- to manage the analysis workflow.
- We use some other python packages to postprocess the results, produce plots, etc.
- Some of the baseline methods are implemented in R or MATLAB.
Hence, the analyses entail some extra setup:
-
Install python dependencies (using
conda
)- For the purposes of these instructions, we assume you have Anaconda3 or Miniconda3 installed,
and have access to the
conda
environment manager.
(We recommend using Miniconda; find full installation instructions here.) - We recommend setting up a dedicated virtual environment for this project.
The following will create a new environment named
ssps
and install the required python packages:
$ conda create -n ssps -c conda-forge pandas matplotlib numpy bioconda::snakemake-minimal $ conda activate ssps (ssps) $
- If you plan to reproduce the analyses on a cluster, then install cookiecutter and the complete version of snakemake
(ssps) $ conda install -c conda-forge cookiecutter bioconda::snakemake
and find the appropriate Snakemake profile from this list: https://github.com/Snakemake-Profiles/doc install the Snakemake profile using cookiecutter:
(ssps) $ cookiecutter https://github.com/Snakemake-Profiles/htcondor.git
replacing the example with the desired profile.
- For the purposes of these instructions, we assume you have Anaconda3 or Miniconda3 installed,
and have access to the
-
Install R packages
-
Check whether MATLAB is installed.
- If you don't have MATLAB, then you won't be able to run the exact DBN inference method of Hill et al., 2012.
- You'll need to comment out the
hill
method wherever it appears inanalysis_config.yaml
.
After completing this additional setup, we are ready to run the analyses.
- Make any necessary modifications to the configuration file:
analysis_config.yaml
. This file controls the space of hyperparameters and datasets explored in the analyses. - Run the analyses using
snakemake
:- If you're running the analyses on your local host, simply move to the directory containing
Snakefile
and callsnakemake
.
(ssps) $ cd ssps (ssps) $ snakemake
- Since Julia is a dynamically compiled language, some time will be devoted to compilation when you run SSPS for the first time. You may see some warnings in
stdout
-- this is normal. - If you're running the analyses on a cluster, call snakemake with the same Snakemake profile you found here:
(You will probably need to edit the job submission parameters in the profile's(ssps) $ cd ssps (ssps) $ snakemake --profile YOUR_PROFILE_NAME
config.yaml
file.) - If you're running the analyses on your local host, simply move to the directory containing
- Relax. It will take tens of thousands of cpu-hours to run all of the analyses.
Follow these steps to run SSPS on your dataset. You will need
- a CSV file (tab separated) containing your time series data
- a CSV file (comma separated) containing your prior edge confidences.
- Optional: a JSON file containing a list of variable names (i.e., node names).
- Install the python dependencies if you haven't already. Find detailed instructions above.
cd
to therun_ssps
directory- Configure the parameters in
ssps_config.yaml
as appropriate - Run Snakemake:
$ snakemake --cores 1
. Increase 1 to increase the maximum number of CPU cores to be used.
SSPS allows two levels of parallelism: (1) at the Markov chain level and (2) at the iteration level.
- Chain-level parallelism is provided via Snakemake. For example, Snakemake can run 4 chains simultaneously if you specify
--cores 4
at the command line:$ snakemake --cores 4
. In essence, this just creates 4 instances of SSPS that run simultaneously. - Iteration-level parallelism is provided by Julia's multi-threading features. The number of threads available to a SSPS instance is specified by an environment variable:
JULIA_NUM_THREADS
. - The total number of CPUs used by your SSPS jobs is the product of Snakemake's
--cores
parameter and Julia'sJULIA_NUM_THREADS
environment variable. Concretely: if we runsnakemake --cores 2
and haveJULIA_NUM_THREADS=4
, then up to 8 CPUs may be used at one time by the SSPS jobs.
SSPS is available under the MIT License, Copyright © 2020 David Merrell.
The MATLAB code dynamic_network_inference.m
has been modified from the original version, Copyright © 2012 Steven Hill and Sach Mukherjee.
The dream-challenge
data is described in Hill et al., 2016 and is originally from Synapse.