Skip to content

Commit

Permalink
Merge pull request #15 from gibsramen/log-ratio
Browse files Browse the repository at this point in the history
Generalize Handler interface for 1D & 2D data
  • Loading branch information
gibsramen authored May 10, 2022
2 parents 824fb2a + abfc494 commit 8d458bb
Show file tree
Hide file tree
Showing 17 changed files with 339 additions and 272 deletions.
88 changes: 46 additions & 42 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,14 +4,14 @@

# Evident

Evident is a tool for performing effect size and power calculations on microbiome diversity data.
Evident is a tool for performing effect size and power calculations on microbiome data.

## Installation

You can install the most up-to-date version of Evident from PyPi using the following command:

```bash
pip install Evident
pip install evident
```

## QIIME 2
Expand All @@ -30,7 +30,7 @@ You should see something like this if Evident installed correctly:
```bash
Usage: qiime evident [OPTIONS] COMMAND [ARGS]...

Description: Perform power analysis on microbiome diversity data. Supports
Description: Perform power analysis on microbiome data. Supports
calculation of effect size given metadata covariates and supporting
visualizations.

Expand All @@ -46,24 +46,31 @@ Options:
--help Show this message and exit.

Commands:
alpha-effect-size-by-category Alpha diversity effect size by category.
alpha-power-analysis Alpha diversity power analysis.
alpha-power-analysis-repeated-measures
Alpha diversity power analysis for repeated
multivariate-effect-size-by-category
Multivariate data effect size by category.
multivariate-power-analysis Multivariate data power analysis.
plot-power-curve Plot power curve.
univariate-effect-size-by-category
Univariate data effect size by category.
univariate-power-analysis Univariate data power analysis.
univariate-power-analysis-repeated-measures
Univariate data power analysis for repeated
measures.

beta-effect-size-by-category Beta diversity effect size by category.
beta-power-analysis Beta diversity power analysis.
plot-power-curve Plot power curve.
visualize-results Tabulate evident results.
```

## Standalone Usage

Evident requires two input files:
Evident can operate on two types of data:

* Univariate (vector)
* Multivariate (distance matrix)

Univariate data can be alpha diversity. log ratios, PCoA coordinates, etc.
Multivariate data is usually a beta diversity distance matrix.

1. Either an alpha or beta diversity file
2. Sample metadata
For this tutorial we will be using alpha diversity values, but the commands are nearly the same for beta diversity distance matrices.

First, open Python and import Evident

Expand All @@ -72,10 +79,6 @@ import evident
```

Next, load your diversity file and sample metadata.
For alpha diversity, this should be a pandas Series.
For beta diversity, this should be an scikit-bio DistanceMatrix.
Sample metadata should be a pandas DataFrame.
We'll be using an alpha diversity vector for this tutorial but the commands are nearly the same for beta diversity distance matrices.

```python
import pandas as pd
Expand All @@ -84,17 +87,17 @@ metadata = pd.read_table("data/metadata.tsv", sep="\t", index_col=0)
faith_pd = metadata["faith_pd"]
```

The main data structure in Evident is the 'DiversityHandler'.
This is the way that Evident stores the diversity data and metadata for power calculations.
For our alpha diversity example, we'll load the `AlphaDiversityHandler` class from Evident.
`AlphaDiversityHandler` takes as input the pandas Series with the diversity values and the pandas DataFrame containing the sample metadata.
The main data structure in Evident is the 'DataHandler'.
This is the way that Evident stores the data and metadata for power calculations.
For our alpha diversity example, we'll load the `UnivariateDataHandler` class from Evident.
`UnivariateDataHandler` takes as input the pandas Series with the diversity values and the pandas DataFrame containing the sample metadata.
By default, Evident will only consider metadata columns with, at max, 5 levels.
To modify this behavior, provide a value for the `max_levels_per_category` argument.
Additionally, Evident will not consider any category levels represented by fewer than 3 samples.
To modify this behavior, use the `min_count_per_level` argument.

```python
adh = evident.AlphaDiversityHandler(faith_pd, metadata)
adh = evident.UnivariateDataHandler(faith_pd, metadata)
```

Next, let's say we want to get the effect size of the diversity differences between two groups of samples.
Expand Down Expand Up @@ -180,9 +183,6 @@ You can also check the "Show scatter points" box to overlay the raw data onto th

![Bokeh Data Panel](https://raw.githubusercontent.com/biocore/evident/master/imgs/bokeh_panel_2.png)

We provide a command line script to generate an interactive app using some test data.
You can access this script at `evident/tests/make_interactive.py`.

Note that because evident uses Python to perform the power calculations, it is at the moment *not* possible to embed this interactive app into a standalone webpage.

## QIIME 2 Usage
Expand All @@ -193,21 +193,23 @@ If not, we recommend you read the excellent [documentation](https://docs.qiime2.
Note that we have only tested Evident on QIIME 2 version 2021.11.
If you are using a different version and encounter an error please let us know via an issue.

As with the standalone version, Evident requires a diversity file and a sample metadata file.
These inputs are expected to conform to QIIME 2 standards.

To calculate power, we can run the following command:

```bash
qiime evident alpha-power-analysis \
--i-alpha-diversity faith_pd.qza \
qiime evident univariate-power-analysis \
--m-sample-metadata-file metadata.qza \
--m-sample-metadata-column classification \
--m-sample-metadata-file faith_pd.qza \
--p-data-column faith_pd \
--p-group-column classification \
--p-alpha 0.01 0.05 0.1 \
--p-total-observations $(seq 10 10 100) \
--o-power-analysis-results results.qza
```

We provide multiple sample metadata files to QIIME 2 because they are internally merged.
You should provide a value for `--p-data-column` so Evident knows which column in the merged metadata contains the numeric values (this is only necessary for univariate analysis).
In this case, the name of the `faith_pd.qza` vector is `faith_pd` so we use that as input.

Notice how we used `$(seq 10 10 100)` to provide input into the `--p-total-observations` argument.
`seq` is a command on UNIX-like systems that generates a sequence of numbers.
In our example, we used `seq` to generate the values from 10 to 100 in intervals of 10 (10, 20, ..., 100).
Expand Down Expand Up @@ -247,29 +249,30 @@ effect_size_by_category(
With QIIME 2:

```bash
qiime evident alpha-effect-size-by-category \
--i-alpha-diversity faith_pd.qza \
qiime evident univariate-effect-size-by-category \
--m-sample-metadata-file metadata.qza \
--p-columns classification sex cd_behavior \
--m-sample-metadata-file faith_pd.qza \
--p-data-column faith_pd \
--p-group-columns classification sex cd_behavior \
--p-n-jobs 2 \
--o-effect-size-results alpha_effect_sizes.qza
```

## Repeated Measures

Evident supports limited analysis of repeated measures.
When your dataset has repeated measures, you can calculate `eta_squared` for alpha diversity differences.
Note that only alpha diversity is supported with repeated measures.
When your dataset has repeated measures, you can calculate `eta_squared` for univariate data.
Note that multivariate data is not supported for repeated measures analysis.
Power analysis for repeated measures implements a repeated measures ANOVA.
Additionally, when performing power analysis *only* power can be calculated (in contrast to `AlphaDiversityHandler` and `BetaDiversityHandler` where alpha, significance, and observations can be calculated).
Additionally, when performing power analysis *only* power can be calculated (in contrast to `UnivariateDataHandler` and `MultivariateDataHandler` where alpha, significance, and observations can be calculated).
This power analysis assumes that the number of measurements per group is equal.

With Python:

```python
from evident.diversity_handler import RepeatedMeasuresAlphaDiversityHandler
from evident.data_handler import RepeatedMeasuresUnivariateDataHandler

rmadh = RepeatedMeasuresAlphaDiversityHandler(
rmadh = RepeatedMeasuresUnivariateDataHandler(
faith_pd,
metadata,
individual_id_column="subject",
Expand All @@ -288,9 +291,10 @@ power_analysis_result = rmandh.power_analysis(
With QIIME 2:

```
qiime evident alpha-power-analysis-repeated-measures \
--i-alpha-diversity faith_pd.qza \
--m-sample-metadata metadata.qza \
qiime evident univariate-power-analysis-repeated-measures \
--m-sample-metadata-file metadata.qza \
--m-sample-metadata-file faith_pd.qza \
--p-data-column faith_pd \
--p-individual-id-column subject \
--p-state-column group \
--p-subjects 2 4 5 \
Expand Down
4 changes: 2 additions & 2 deletions evident/__init__.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
from .diversity_handler import AlphaDiversityHandler, BetaDiversityHandler
from .data_handler import UnivariateDataHandler, MultivariateDataHandler


__version__ = "0.3.0"

__all__ = ["AlphaDiversityHandler", "BetaDiversityHandler"]
__all__ = ["UnivariateDataHandler", "MultivariateDataHandler"]
29 changes: 13 additions & 16 deletions evident/diversity_handler.py → evident/data_handler.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,8 +18,8 @@
from .utils import _listify, _check_sample_overlap


class _BaseDiversityHandler(ABC):
"""Abstract class for handling diversity data and metadata."""
class _BaseDataHandler(ABC):
"""Abstract class for handling data and metadata."""
def __init__(
self,
data=None,
Expand Down Expand Up @@ -96,10 +96,7 @@ def calculate_effect_size(
column: str,
difference: float = None
) -> EffectSizeResult:
"""Get effect size of diversity differences given column.
If a subject column was provided, all effect sizes will be calculated
as eta squared from a repeated measures ANOVA.
"""Get effect size of data differences given column.
Otherwise, if two categories, return Cohen's d from t-test. If more
than two categories, return Cohen's f from ANOVA.
Expand Down Expand Up @@ -153,7 +150,7 @@ def power_analysis(
alpha: float = None,
power: float = None
) -> Union[CrossSectionalPowerAnalysisResult, PowerAnalysisResults]:
"""Perform power analysis using this diversity dataset.
"""Perform power analysis using this dataset.
Exactly one of total_observations, alpha, or power must be None.
Expand Down Expand Up @@ -376,7 +373,7 @@ def _create_partial_power_func(
return power_func


class AlphaDiversityHandler(_BaseDiversityHandler):
class UnivariateDataHandler(_BaseDataHandler):
def __init__(
self,
data: pd.Series,
Expand All @@ -385,9 +382,9 @@ def __init__(
min_count_per_level: int = 3,
**kwargs
):
"""Handler for alpha diversity data.
"""Handler for univariate data.
:param data: Alpha diversity vector
:param data: Univariate data vector
:type data: pd.Series
:param metadata: Sample metadata
Expand Down Expand Up @@ -422,11 +419,11 @@ def __init__(
)

def subset_values(self, ids: list) -> np.array:
"""Get alpha-diversity differences among provided samples."""
"""Get univariate data differences among provided samples."""
return self.data.loc[ids].values


class RepeatedMeasuresAlphaDiversityHandler(AlphaDiversityHandler):
class RepeatedMeasuresUnivariateDataHandler(UnivariateDataHandler):
def __init__(
self,
data: pd.Series,
Expand Down Expand Up @@ -546,17 +543,17 @@ def _bulk_power_analysis(
return PowerAnalysisResults(results_list)


class BetaDiversityHandler(_BaseDiversityHandler):
class MultivariateDataHandler(_BaseDataHandler):
def __init__(
self,
data: DistanceMatrix,
metadata: pd.DataFrame,
max_levels_per_category: int = 5,
min_count_per_level: int = 3,
):
"""Handler for beta diversity data.
"""Handler for multivariate data.
:param data: Beta diversity distance matrix
:param data: Multivariate distance matrix
:type data: skbio.DistanceMatrix
:param metadata: Sample metadata
Expand All @@ -582,5 +579,5 @@ def __init__(
)

def subset_values(self, ids: list) -> np.array:
"""Get beta-diversity differences among provided samples."""
"""Get multivariate data differences among provided samples."""
return np.array(self.data.filter(ids).to_series().values)
18 changes: 9 additions & 9 deletions evident/effect_size.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,13 +3,13 @@
from joblib import Parallel, delayed
import pandas as pd

from evident.diversity_handler import _BaseDiversityHandler
from evident.data_handler import _BaseDataHandler
from evident.stats import calculate_cohens_d
from evident.results import EffectSizeResults, PairwiseEffectSizeResult


def effect_size_by_category(
diversity_handler: _BaseDiversityHandler,
data_handler: _BaseDataHandler,
columns: list = None,
n_jobs: int = None,
parallel_args: dict = None
Expand All @@ -22,8 +22,8 @@ def effect_size_by_category(
numeric effect size. Sorts output first by Cohen's d -> f and then effect
size in decreasing order.
:param diversity_handler: Either an alpha or beta DiversityHandler
:type diversity_handler: evident.diversity_handler._BaseDiversityHandler
:param data_handler: Either an alpha or beta DataHandler
:type data_handler: evident.data_handler._BaseDataHandler
:param columns: Columns to use for effect size calculations
:type columns: List[str]
Expand All @@ -41,7 +41,7 @@ def effect_size_by_category(
:rtype: pd.DataFrame
"""
_check_columns(columns)
dh = diversity_handler
dh = data_handler

if parallel_args is None:
parallel_args = dict()
Expand All @@ -55,7 +55,7 @@ def effect_size_by_category(


def pairwise_effect_size_by_category(
diversity_handler: _BaseDiversityHandler,
data_handler: _BaseDataHandler,
columns: list = None,
n_jobs: int = None,
parallel_args: dict = None
Expand All @@ -69,8 +69,8 @@ def pairwise_effect_size_by_category(
'column'. 'cohens_d' has the effect size of each comparison. Output is
sorted by decreasing 'cohens_d'.
:param diversity_handler: Either an alpha or beta DiversityHandler
:type diversity_handler: evident.diversity_handler._BaseDiversityHandler
:param data_handler: Either an alpha or beta DataHandler
:type data_handler: evident.data_handler._BaseDataHandler
:param columns: Columns to use for effect size calculations
:type columns: List[str]
Expand All @@ -88,7 +88,7 @@ def pairwise_effect_size_by_category(
:rtype: pd.DataFrame
"""
_check_columns(columns)
dh = diversity_handler
dh = data_handler

if parallel_args is None:
parallel_args = dict()
Expand Down
Loading

0 comments on commit 8d458bb

Please sign in to comment.