From 3a7515c6d7e236eea53c9ee0b36f9b993a4d8fb8 Mon Sep 17 00:00:00 2001 From: Gibs Date: Fri, 22 Jul 2022 12:57:12 -0700 Subject: [PATCH] Add docs page on Negative Binomial model (#77) * Add NB docs * Fix some documentation typos --- docs/index.rst | 1 + docs/models.rst | 2 -- docs/negative_binomial.rst | 47 ++++++++++++++++++++++++++++++++++++++ 3 files changed, 48 insertions(+), 2 deletions(-) create mode 100644 docs/negative_binomial.rst diff --git a/docs/index.rst b/docs/index.rst index 0452215..c0e4edd 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -46,6 +46,7 @@ If you are planning on contributing to BIRDMAn you must also install the followi :caption: How BIRDMAn Works bayesian_inference + negative_binomial .. toctree:: :maxdepth: 2 diff --git a/docs/models.rst b/docs/models.rst index c554f05..8dfd869 100644 --- a/docs/models.rst +++ b/docs/models.rst @@ -21,7 +21,6 @@ Table Model You should inherit/instantiate this class if you are building a custom model for estimating parameters of an entire table at once. .. autoclass:: birdman.model_base.TableModel - :members: Single Feature Model -------------------- @@ -29,7 +28,6 @@ Single Feature Model This class is designed for those interested in parallelizing model fitting across multiple features at once. We do not explicitly perform parallelization but rather leave that to the user. .. autoclass:: birdman.model_base.SingleFeatureModel - :members: Model Iterator -------------- diff --git a/docs/negative_binomial.rst b/docs/negative_binomial.rst new file mode 100644 index 0000000..9004b3d --- /dev/null +++ b/docs/negative_binomial.rst @@ -0,0 +1,47 @@ +Negative binomial models +======================== + +You might notice that we use the negative binomial (NB) model quite a lot in BIRDMAn. Here we will briefly explain why we use this distribution for modelling microbiome data. Do note, however, that BIRDMAn is flexible to custom models implementing other distributions such as Dirichlet-multinomial, Poisson-lognormal, etc. + +Count models +------------ + +When we are working with microbiome sequencing (or really most other 'omics data types), our results are typically *counts*. As such, we can't use simple linear regression (which includes fractions and negative numbers) to model the number of each microbe in each sample. We instead want to use a statistical distribution that is explicitly defined on the set of whole numbers. The simplest of such models is the `Poisson distribution `_. + +The Poisson distribution has a single parameter, :math:`\lambda` that defines the rate of "events". In this distribution, the mean and variance are the same, with both being equal to :math:`\lambda`. This poses a problem for microbiome data, since the variance is typically much greater than the mean. For more on this, see `McMurdie & Holmes 2014 `_. + +Negative binomial model +----------------------- + +Enter the negative binomial model. The negative binomial distribution can be thought of as a Poisson with an allowance for extra variance. This is useful for microbiome data, as we can account for the count nature of the data while accomodating inflated variance. + +We use the following parameterization from Stan where :math:`i` represents a sample and :math:`j` represents a microbe. + +.. math:: + + y_{ij} = \textrm{NegativeBinomial}(\mu_{ij}, \phi_j) + +In this model, :math:`mu_{ij}` represents the mean count of this microbe in this sample and :math:`\phi_j` represents the dispersion parameter for this microbe. This model is useful because we can model one or both parameters hierarchically according to a generalized linear model with log-link function. + +For example, if we expect that the mean abundance depends on whether the sample is a case or control, we can add a parameter and fit it using BIRDMAn. + +.. math:: + + \ln(\mu_{ij}) = \beta_{0j} + \beta_{1j} x_i + \ln(\textrm{Depth}_i) + +In this equation, :math:`\beta_0` is the *intercept* term. This term is the average (log) control sample proportion. For more on this see `this blogpost `_. + +Here, :math:`x_i` is a binary vector representing whether sample :math:`i` is a case sample (1) or a control sample (0). Thus, :math:`\beta_1` represents the log-fold change of microbial abundance between cases and controls. + +Finally, we account for the fact that sampling depth differs across samples by adding a correction term of the log depth of the modeled sample. + +.. note:: + + This is a pretty basic equation but we can modify it depending on our experimental design or specific questions. For example we could add more covariates, subject effects, or even model the dispersion as a sample dependent parameter! + +Output +------ + +When we fit this model using BIRDMAn, we get *distributions* of plausible parameter values given our data and priors. For a single microbe, we typically are interested in :math:`\beta_0`, :math:`\beta_1`, and :math:`\phi`. When we fit all of the microbes, we end up with :math:`D` distributions for each of these parameter where D is the total number of microbes in the table. + +You can imagine that if we, for example, modeled dispersion as a function of both sample and microbe, we would get :math:`N \times D` distributions where :math:`N` is the number of samples in our table.