Code for forward population genetic simulation in asexual populations, with special focus on cancer progression. Fitness can be an arbitrary function of genetic interactions between multiple genes or modules of genes, including epistasis, order restrictions in mutation accumulation, and order effects. Fitness can also be a function of the relative and absolute frequencies of other genotypes (i.e., frequency-dependent fitness). Mutation rates can differ between genes, and we can include mutator/antimutator genes (to model mutator phenotypes). Simulating multi-species scenarios and therapeutic interventions is also possible. Simulations use continuous-time models and can include driver and passenger genes and modules. Also included are functions for: simulating random DAGs of the type found in Oncogenetic Trees, Conjunctive Bayesian Networks, and other cancer progression models; plotting and sampling from single or multiple realizations of the simulations, including single-cell sampling; plotting the parent-child relationships of the clones; generating random fitness landscapes (Rough Mount Fuji, House of Cards, additive NK, Ising, and Eggbox models) and plotting them.
There is also new functionality to allow for frequency-dependent fitness, interventions (including adaptive therapy), user specification of both birth and death rates, and user-defined variables that can be used for interventions.
Supported by: grant BFU2015-67302-R (MINECO/FEDER, EU) funded by MCIN/AEI/10.13039/501100011033 and by ERDF A way of making Europe to R. Diaz-Uriarte; grant PID2019-111256RB-I00 funded by MCIN/AEI/10.13039/501100011033 to R. Diaz-Uriarte; "Beca de Colaboración" at the Universidad Autónoma de Madrid from Spanish Ministry of Education, 2017-18, to S. Sánchez Carrillo; Comunidad de Madrid's PEJ16/MED/AI-1709 and PEJ-2019-AI/BMD-13961 to R. Diaz-Uriarte.
The canonical reference for OncoSimulR is this: Diaz-Uriarte, R. 2017. OncoSimulR: genetic simulation with arbitrary epistasis and mutator genes in asexual populations. Bioinformatics, 33 (12): 1898-1899, published. https://doi.org/10.1093/bioinformatics/btx077. This paper gives a quick overview of OncoSimulR. If you use the package in publications please cite the Bioinformatics paper.
You can also take a look at this poster presented at ECCB 2016.
The chapter titled Simulating evolution in asexual populations with epistasis" in K.-C. Wong (ed.), Epistasis: Methods and Protocols. Methods in Molecular Biology emphasizes the use of OncoSimulR for simulations with epistasis.
A former version of this code has been used in the paper "Identifying restrictions in the order of accumulation of mutations during tumor progression: effects of passengers, evolutionary models, and sampling", BMC Bioinformatics, 2015, 16:41. OncoSimulR has also been used extensively in the simulations reported in the Bioinformatics paper "Cancer Progression Models And Fitness Landscapes: A Many-To-Many Relationship" and the PLoS Computational Biology paper "Every which way? On predicting tumor evolution using cancer progression models".
You can also find OncoSimulR on the Genetic Simulation Resources catalogue.
The /OncoSimulR directory contains the code for the BioConductor package OncoSimulR. The /miscell-files directory contains additional files so far only related to the above.
The frequency-dependent fitness functionality became available in BioConductor starting with version 3.13 and is now available in both the stable and devel versions. To use the most recent code in BioConductor, install the devel version.
if (!require("BiocManager"))
install.packages("BiocManager")
BiocManager::install("OncoSimulR", version = "devel")
If you want to use the stable BioConductor version, remove the version = "devel"
part of the invocation.
To start using it:
library(OncoSimulR)
This should work for all operating systems. If not, or if you want to install from sources, read on.
You should install from github as follows:
if (!require("devtools"))
install.packages("devtools") ## if you don't have it already
library(devtools)
install_github("rdiaz02/OncoSimul/OncoSimulR",
dependencies = TRUE)
(setting depèndencies = TRUE
ensures that "Suggests" are also installed).
If you use Rtools40, the new toolchain starting R-4.0.0, April 2000 (so a long time ago already) (https://cran.r-project.org/bin/windows/Rtools/), you can install OncoSimulR from sources. These are old notes.
How to do it: The standard installation procedure should work, but the following steps might help.
- Install Rtools42 and R as explained in https://cran.r-project.org/bin/windows/Rtools/rtools42/rtools.html
- Now, install OncoSimulR from BioConductor (to resolve all dependencies in one go): https://www.bioconductor.org/packages/devel/bioc/html/OncoSimulR.html For a fresh R installation this can take more than one hour.
- Clone the git repo and move to that directory.
- Go to a MINGW shell console, and install. For example, if you have installed R-testing under 'C:\R', you can do
/c/R/R-testing/bin/x64/R CMD INSTALL --no-multiarch OncoSimulR
Alternatively, install from a local file, but you need to specify the tar.gz (the zip file will not work, of course, since the R-testing that ships for/with Rtools will not install from zip files)
Installing from source takes a while (more than 5 minutes).
The github repository for this package is this one: https://github.com/rdiaz02/OncoSimul . Since mid-2017 BioConductor is maintained using git, but since this directory contains other files and directories in addition to the OncoSimulR package itself, I have not used option "Sync an existing GitHub repository with Bioconductor". Instead, I continue using this github repo, but then locally update a Bioconductor-only repository of just the OncoSimulR code (as explained in Maintain a Bioconductor-only repository for an existing package).
As any R/BioConductor package, OncoSimulR comes with documentation for its
user-visible functions and data sets (using the help is just standard R
usage). From
OncoSimulR's BioConductor page
you have access to the standard documentation both the manual and overview
---the vignette. The best place to start is the vignette (created from the
OncoSimulR/vignettes/OncoSimulR.Rnw
file that includes both text and
code).
You can view the vignette from R itself doing
browseVignettes("OncoSimulR")
and this gives you access to the HTML, the Rmd file (markdown + R), and the R code.
From these two links you can also browse the HTML vignette and get a PDF version.
These files correspond to the most recent, github version, of the package (i.e., they might include changes not yet available from the BioConudctor package). Beware that the might have figures and R code that do not fit on the page, etc.
The original implementation is by Ramon Diaz-Uriarte.
The frequency-dependent fitness functionality is based on Sergio Sanchez-Carrilo's Master's thesis (see also file 'miscell-files/Sergio_Sanchez-Carillo-improvements-post-TFM.pdf' for additional features that were not described in the original thesis) and additional functionality has been added by Juan Antonio Miguel González (2020). Ramon Diaz-Uriarte further simplified and generalized the specification of fitness under frequency-dependence.
Work by Niklas Endres (late 2019 to early 2020) allows to use T (time) in fitness; fitness could already be made to depend on population sizes, and now, then, also on time. This allows for complex scenarios, as illustrated in the vignette (but adaptive therapy was not really possible as we cannot condition on previous states arbitrarily).
Work by Javier Muñoz Haro (spring 2021) makes interventions possible on population sizes; these can take place at arbitrary times, be repeated, etc. We can have extremely flexible intervention mechanisms, also as illustrated in the vignette (but adaptive therapy still not possible as we cannot use arbitrary user variables to condition on previous states of the population).
Work by Alberto González Klein (spring 2021) allows users to specify death rates. Before this work, death rates where not user-modifiable: they were 1 for the exponential model and a fixed function of population size in the McFarland model. Now, users can specify death rates and use frequency-dependent models that affect birth, or death, or both.
Work by Javier López Cano (spring 2022) unifies the additions by Javier Muñoz Haro and Alberto González Klein and also makes user variables possible; now we can emulate adaptive therapy (see vignette). (Note, though, that two minor limitations remain: (a) we cannot change mutation rates; (b) it is unclear that we can modify fitness based on user-variables.)
The R/BioConductor OncoSimulR package is licensed under the GPLv3
license. The code for the OncoSimulR BioConductor package, except for
functions plot.stream
and plot.stacked
, is Copyright 2012-2020 by
Ramon Diaz-Uriarte; the code for frequency dependent fitness is Copyright
2017-2019 Sergio Sanchez-Carrillo and 2019-2020 Juan Antonio Miguel
Gonzalez. plot.stream
and plot.stacked
are Copyright 2013-2016 by Marc
Taylor (see also https://github.com/marchtaylor/sinkr and
http://menugget.blogspot.com.es/2013/12/data-mountains-and-streams-stacked-area.html).
The code under src/FitnessLandascape
is from MAGELLAN, Maps of
Genetical
Landscapes.
The authors are S. Brouillet, G. Achaz, S. Matuszewski, H. Annoni, and
L. Ferreti. I downloaded the sources on 2019-06-05 from
http://wwwabi.snv.jussieu.fr/public/magellan/latest.tgz. The code is under
the GPLv3. MAGELLAN is "an integrated tool to visualize and analyze
fitness landscapes of small dimension (up to 7-8 loci)". In OncoSimulR we
use only a very limited subset of the functionality of MAGELLAN (mostly to
generate different types of random fitness landscapes and to compute
statistics of epistasis); the Makevars
file we use only compiles two of
the executables (fl_statistics
and fl_generate
) ---the directory
src/FitnessLandascape
contains, however, the complete sources. Note also
that the plots of fitness landscapes used in OncoSimulR are actually
blatantly copied in looks from MAGELLAN's plots.
The code under OncoSimulR/src/exprtk.h
is from The C++ Mathematical
Expression Toolkit Library
(ExprTk). This code
is copyright Arash Partow, and is licensed under "The MIT License (MIT)"
(http://www.opensource.org/licenses/MIT) and is compatible with GPL
(http://directory.fsf.org/wiki/License:Expat). The file was originally
downloaded from http://www.partow.net/programming/exprtk/index.html on
2017-05-15. The most recent version was downloaded again on 2022-09-12,
from the exprTk repo.
The file was originally named exprtk.hpp
; to conform to R's
requirements, it was renamed as exprtk.h
The code in OncoSimulR/src/multivariate_hypergeometric.cpp
and OncoSimulR/src/multivariate_hypergeometric.h
is from extraDistr. This code is copyright Tymoteusz Wolodzko, and is licensed under the GNU GPL 2.
The code in miscell-files/randutils.h
is copyright Melissa E. O'Neill,
and is licensed under "The MIT License (MIT)" in the terms explained in
the file itself. This is a license that is
compatible with the GPL.
The file randutils.hpp was downloaded from
https://gist.github.com/imneme/540829265469e673d045 on 2015-06-20 and is
also referenced from the main article [Ease of Use without Loss of Power]
(http://www.pcg-random.org/posts/ease-of-use-without-loss-of-power.html). I
renamed it to randutils.h to conform to R's requirements (and changed the
auto exit_func = hash(&_Exit);
line to keep R from complaining about the
Exit function). I had to disable usage of randutils for now, since I could
not get it to compile with gcc-4.6 (since version 3.3 of R,
the official Rtools for Windows now support C++-11, so I might change
this in the near future).
The file under gitinfo-hooks is Copyright 2011 Brent Longborough, is part of the gitinfo package, and is under the LaTeX Project Public License 1.3, which is incompatible with the GPL. Note this file is not part of the OncoSimulR BioConductor package.
Functions nem_transitive.reduction
and nem_transitive.closure
are from the
nem
package, which was last available for version 3.11 of BioConductor
(https://www.bioconductor.org/packages/3.11/bioc/html/nem.html). The authors of
the package are Holger Froehlich, Florian Markowetz, Achim Tresch, Theresa
Niederberger, Christian Bender, Matthias Maneck, Claudio Lottaz, Tim Beissbarth,
and the package
The files under miscell-files/AParramon_discrete_time are copyright Alberto Parramon, unless otherwise specified. This is an implementation of a discrete-time version of OncoSimulR.
Bioconductor (multiple platforms) | Travis CI (Linux) | Appveyor (Windows) | |
---|---|---|---|
R CMD check | (release) (devel) |
||
Test coverage |
(Note: Appveyor and Travis can fail for reasons that have nothing to do with the package, such as R not being downloaded correctly, etc. Look at the details of each failure. Similarly, some of the errors in BioConductor, specially in the development branch, can be caused, specially in Windows, by some required packages not being yet available, often "car" and _"igraph".
Again, look at the details of each failure.)Supported by: grant BFU2015-67302-R (MINECO/FEDER, EU) funded by MCIN/AEI/10.13039/501100011033 and by ERDF A way of making Europe to R. Diaz-Uriarte; grant PID2019-111256RB-I00 funded by MCIN/AEI/10.13039/501100011033 to R. Diaz-Uriarte; "Beca de Colaboración" at the Universidad Autónoma de Madrid from Spanish Ministry of Education, 2017-18, to S. Sánchez Carrillo; Comunidad de Madrid's PEJ16/MED/AI-1709 and PEJ-2019-AI/BMD-13961 to R. Diaz-Uriarte.