diff --git a/README.md b/README.md index 267bfb1..1adb0ae 100644 --- a/README.md +++ b/README.md @@ -4,14 +4,14 @@ # Evident -Evident is a tool for performing effect size and power calculations on microbiome diversity data. +Evident is a tool for performing effect size and power calculations on microbiome data. ## Installation You can install the most up-to-date version of Evident from PyPi using the following command: ```bash -pip install Evident +pip install evident ``` ## QIIME 2 @@ -30,7 +30,7 @@ You should see something like this if Evident installed correctly: ```bash Usage: qiime evident [OPTIONS] COMMAND [ARGS]... - Description: Perform power analysis on microbiome diversity data. Supports + Description: Perform power analysis on microbiome data. Supports calculation of effect size given metadata covariates and supporting visualizations. @@ -46,24 +46,31 @@ Options: --help Show this message and exit. Commands: - alpha-effect-size-by-category Alpha diversity effect size by category. - alpha-power-analysis Alpha diversity power analysis. - alpha-power-analysis-repeated-measures - Alpha diversity power analysis for repeated + multivariate-effect-size-by-category + Multivariate data effect size by category. + multivariate-power-analysis Multivariate data power analysis. + plot-power-curve Plot power curve. + univariate-effect-size-by-category + Univariate data effect size by category. + univariate-power-analysis Univariate data power analysis. + univariate-power-analysis-repeated-measures + Univariate data power analysis for repeated measures. - beta-effect-size-by-category Beta diversity effect size by category. - beta-power-analysis Beta diversity power analysis. - plot-power-curve Plot power curve. visualize-results Tabulate evident results. ``` ## Standalone Usage -Evident requires two input files: +Evident can operate on two types of data: + +* Univariate (vector) +* Multivariate (distance matrix) + +Univariate data can be alpha diversity. log ratios, PCoA coordinates, etc. +Multivariate data is usually a beta diversity distance matrix. -1. Either an alpha or beta diversity file -2. Sample metadata +For this tutorial we will be using alpha diversity values, but the commands are nearly the same for beta diversity distance matrices. First, open Python and import Evident @@ -72,10 +79,6 @@ import evident ``` Next, load your diversity file and sample metadata. -For alpha diversity, this should be a pandas Series. -For beta diversity, this should be an scikit-bio DistanceMatrix. -Sample metadata should be a pandas DataFrame. -We'll be using an alpha diversity vector for this tutorial but the commands are nearly the same for beta diversity distance matrices. ```python import pandas as pd @@ -84,17 +87,17 @@ metadata = pd.read_table("data/metadata.tsv", sep="\t", index_col=0) faith_pd = metadata["faith_pd"] ``` -The main data structure in Evident is the 'DiversityHandler'. -This is the way that Evident stores the diversity data and metadata for power calculations. -For our alpha diversity example, we'll load the `AlphaDiversityHandler` class from Evident. -`AlphaDiversityHandler` takes as input the pandas Series with the diversity values and the pandas DataFrame containing the sample metadata. +The main data structure in Evident is the 'DataHandler'. +This is the way that Evident stores the data and metadata for power calculations. +For our alpha diversity example, we'll load the `UnivariateDataHandler` class from Evident. +`UnivariateDataHandler` takes as input the pandas Series with the diversity values and the pandas DataFrame containing the sample metadata. By default, Evident will only consider metadata columns with, at max, 5 levels. To modify this behavior, provide a value for the `max_levels_per_category` argument. Additionally, Evident will not consider any category levels represented by fewer than 3 samples. To modify this behavior, use the `min_count_per_level` argument. ```python -adh = evident.AlphaDiversityHandler(faith_pd, metadata) +adh = evident.UnivariateDataHandler(faith_pd, metadata) ``` Next, let's say we want to get the effect size of the diversity differences between two groups of samples. @@ -180,9 +183,6 @@ You can also check the "Show scatter points" box to overlay the raw data onto th ![Bokeh Data Panel](https://raw.githubusercontent.com/biocore/evident/master/imgs/bokeh_panel_2.png) -We provide a command line script to generate an interactive app using some test data. -You can access this script at `evident/tests/make_interactive.py`. - Note that because evident uses Python to perform the power calculations, it is at the moment *not* possible to embed this interactive app into a standalone webpage. ## QIIME 2 Usage @@ -193,21 +193,23 @@ If not, we recommend you read the excellent [documentation](https://docs.qiime2. Note that we have only tested Evident on QIIME 2 version 2021.11. If you are using a different version and encounter an error please let us know via an issue. -As with the standalone version, Evident requires a diversity file and a sample metadata file. -These inputs are expected to conform to QIIME 2 standards. - To calculate power, we can run the following command: ```bash -qiime evident alpha-power-analysis \ - --i-alpha-diversity faith_pd.qza \ +qiime evident univariate-power-analysis \ --m-sample-metadata-file metadata.qza \ - --m-sample-metadata-column classification \ + --m-sample-metadata-file faith_pd.qza \ + --p-data-column faith_pd \ + --p-group-column classification \ --p-alpha 0.01 0.05 0.1 \ --p-total-observations $(seq 10 10 100) \ --o-power-analysis-results results.qza ``` +We provide multiple sample metadata files to QIIME 2 because they are internally merged. +You should provide a value for `--p-data-column` so Evident knows which column in the merged metadata contains the numeric values (this is only necessary for univariate analysis). +In this case, the name of the `faith_pd.qza` vector is `faith_pd` so we use that as input. + Notice how we used `$(seq 10 10 100)` to provide input into the `--p-total-observations` argument. `seq` is a command on UNIX-like systems that generates a sequence of numbers. In our example, we used `seq` to generate the values from 10 to 100 in intervals of 10 (10, 20, ..., 100). @@ -247,10 +249,11 @@ effect_size_by_category( With QIIME 2: ```bash -qiime evident alpha-effect-size-by-category \ - --i-alpha-diversity faith_pd.qza \ +qiime evident univariate-effect-size-by-category \ --m-sample-metadata-file metadata.qza \ - --p-columns classification sex cd_behavior \ + --m-sample-metadata-file faith_pd.qza \ + --p-data-column faith_pd \ + --p-group-columns classification sex cd_behavior \ --p-n-jobs 2 \ --o-effect-size-results alpha_effect_sizes.qza ``` @@ -258,18 +261,18 @@ qiime evident alpha-effect-size-by-category \ ## Repeated Measures Evident supports limited analysis of repeated measures. -When your dataset has repeated measures, you can calculate `eta_squared` for alpha diversity differences. -Note that only alpha diversity is supported with repeated measures. +When your dataset has repeated measures, you can calculate `eta_squared` for univariate data. +Note that multivariate data is not supported for repeated measures analysis. Power analysis for repeated measures implements a repeated measures ANOVA. -Additionally, when performing power analysis *only* power can be calculated (in contrast to `AlphaDiversityHandler` and `BetaDiversityHandler` where alpha, significance, and observations can be calculated). +Additionally, when performing power analysis *only* power can be calculated (in contrast to `UnivariateDataHandler` and `MultivariateDataHandler` where alpha, significance, and observations can be calculated). This power analysis assumes that the number of measurements per group is equal. With Python: ```python -from evident.diversity_handler import RepeatedMeasuresAlphaDiversityHandler +from evident.data_handler import RepeatedMeasuresUnivariateDataHandler -rmadh = RepeatedMeasuresAlphaDiversityHandler( +rmadh = RepeatedMeasuresUnivariateDataHandler( faith_pd, metadata, individual_id_column="subject", @@ -288,9 +291,10 @@ power_analysis_result = rmandh.power_analysis( With QIIME 2: ``` -qiime evident alpha-power-analysis-repeated-measures \ - --i-alpha-diversity faith_pd.qza \ - --m-sample-metadata metadata.qza \ +qiime evident univariate-power-analysis-repeated-measures \ + --m-sample-metadata-file metadata.qza \ + --m-sample-metadata-file faith_pd.qza \ + --p-data-column faith_pd \ --p-individual-id-column subject \ --p-state-column group \ --p-subjects 2 4 5 \ diff --git a/evident/__init__.py b/evident/__init__.py index d1707d6..7119589 100644 --- a/evident/__init__.py +++ b/evident/__init__.py @@ -1,6 +1,6 @@ -from .diversity_handler import AlphaDiversityHandler, BetaDiversityHandler +from .data_handler import UnivariateDataHandler, MultivariateDataHandler __version__ = "0.3.0" -__all__ = ["AlphaDiversityHandler", "BetaDiversityHandler"] +__all__ = ["UnivariateDataHandler", "MultivariateDataHandler"] diff --git a/evident/diversity_handler.py b/evident/data_handler.py similarity index 95% rename from evident/diversity_handler.py rename to evident/data_handler.py index 50d685d..bd89d6a 100644 --- a/evident/diversity_handler.py +++ b/evident/data_handler.py @@ -18,8 +18,8 @@ from .utils import _listify, _check_sample_overlap -class _BaseDiversityHandler(ABC): - """Abstract class for handling diversity data and metadata.""" +class _BaseDataHandler(ABC): + """Abstract class for handling data and metadata.""" def __init__( self, data=None, @@ -96,10 +96,7 @@ def calculate_effect_size( column: str, difference: float = None ) -> EffectSizeResult: - """Get effect size of diversity differences given column. - - If a subject column was provided, all effect sizes will be calculated - as eta squared from a repeated measures ANOVA. + """Get effect size of data differences given column. Otherwise, if two categories, return Cohen's d from t-test. If more than two categories, return Cohen's f from ANOVA. @@ -153,7 +150,7 @@ def power_analysis( alpha: float = None, power: float = None ) -> Union[CrossSectionalPowerAnalysisResult, PowerAnalysisResults]: - """Perform power analysis using this diversity dataset. + """Perform power analysis using this dataset. Exactly one of total_observations, alpha, or power must be None. @@ -376,7 +373,7 @@ def _create_partial_power_func( return power_func -class AlphaDiversityHandler(_BaseDiversityHandler): +class UnivariateDataHandler(_BaseDataHandler): def __init__( self, data: pd.Series, @@ -385,9 +382,9 @@ def __init__( min_count_per_level: int = 3, **kwargs ): - """Handler for alpha diversity data. + """Handler for univariate data. - :param data: Alpha diversity vector + :param data: Univariate data vector :type data: pd.Series :param metadata: Sample metadata @@ -422,11 +419,11 @@ def __init__( ) def subset_values(self, ids: list) -> np.array: - """Get alpha-diversity differences among provided samples.""" + """Get univariate data differences among provided samples.""" return self.data.loc[ids].values -class RepeatedMeasuresAlphaDiversityHandler(AlphaDiversityHandler): +class RepeatedMeasuresUnivariateDataHandler(UnivariateDataHandler): def __init__( self, data: pd.Series, @@ -546,7 +543,7 @@ def _bulk_power_analysis( return PowerAnalysisResults(results_list) -class BetaDiversityHandler(_BaseDiversityHandler): +class MultivariateDataHandler(_BaseDataHandler): def __init__( self, data: DistanceMatrix, @@ -554,9 +551,9 @@ def __init__( max_levels_per_category: int = 5, min_count_per_level: int = 3, ): - """Handler for beta diversity data. + """Handler for multivariate data. - :param data: Beta diversity distance matrix + :param data: Multivariate distance matrix :type data: skbio.DistanceMatrix :param metadata: Sample metadata @@ -582,5 +579,5 @@ def __init__( ) def subset_values(self, ids: list) -> np.array: - """Get beta-diversity differences among provided samples.""" + """Get multivariate data differences among provided samples.""" return np.array(self.data.filter(ids).to_series().values) diff --git a/evident/effect_size.py b/evident/effect_size.py index 80b2b0a..802e552 100644 --- a/evident/effect_size.py +++ b/evident/effect_size.py @@ -3,13 +3,13 @@ from joblib import Parallel, delayed import pandas as pd -from evident.diversity_handler import _BaseDiversityHandler +from evident.data_handler import _BaseDataHandler from evident.stats import calculate_cohens_d from evident.results import EffectSizeResults, PairwiseEffectSizeResult def effect_size_by_category( - diversity_handler: _BaseDiversityHandler, + data_handler: _BaseDataHandler, columns: list = None, n_jobs: int = None, parallel_args: dict = None @@ -22,8 +22,8 @@ def effect_size_by_category( numeric effect size. Sorts output first by Cohen's d -> f and then effect size in decreasing order. - :param diversity_handler: Either an alpha or beta DiversityHandler - :type diversity_handler: evident.diversity_handler._BaseDiversityHandler + :param data_handler: Either an alpha or beta DataHandler + :type data_handler: evident.data_handler._BaseDataHandler :param columns: Columns to use for effect size calculations :type columns: List[str] @@ -41,7 +41,7 @@ def effect_size_by_category( :rtype: pd.DataFrame """ _check_columns(columns) - dh = diversity_handler + dh = data_handler if parallel_args is None: parallel_args = dict() @@ -55,7 +55,7 @@ def effect_size_by_category( def pairwise_effect_size_by_category( - diversity_handler: _BaseDiversityHandler, + data_handler: _BaseDataHandler, columns: list = None, n_jobs: int = None, parallel_args: dict = None @@ -69,8 +69,8 @@ def pairwise_effect_size_by_category( 'column'. 'cohens_d' has the effect size of each comparison. Output is sorted by decreasing 'cohens_d'. - :param diversity_handler: Either an alpha or beta DiversityHandler - :type diversity_handler: evident.diversity_handler._BaseDiversityHandler + :param data_handler: Either an alpha or beta DataHandler + :type data_handler: evident.data_handler._BaseDataHandler :param columns: Columns to use for effect size calculations :type columns: List[str] @@ -88,7 +88,7 @@ def pairwise_effect_size_by_category( :rtype: pd.DataFrame """ _check_columns(columns) - dh = diversity_handler + dh = data_handler if parallel_args is None: parallel_args = dict() diff --git a/evident/interactive.py b/evident/interactive.py index f9f4a27..5a363c1 100644 --- a/evident/interactive.py +++ b/evident/interactive.py @@ -1,19 +1,19 @@ import os import shutil -from evident.diversity_handler import (_BaseDiversityHandler, - AlphaDiversityHandler, - BetaDiversityHandler) +from evident.data_handler import (_BaseDataHandler, + UnivariateDataHandler, + MultivariateDataHandler) def create_bokeh_app( - diversity_handler: _BaseDiversityHandler, + data_handler: _BaseDataHandler, output: os.PathLike, ) -> None: """Creates interactive power analysis using Bokeh. - :param diversity_handler: Handler with diversity data - :type diversity_handler: evident.diversity_handler._BaseDiversityHandler + :param data_handler: Handler with data + :type data_handler: evident.data_handler._BaseDataHandler :param output: Location to create Bokeh app :type output: os.PathLike @@ -26,16 +26,16 @@ def create_bokeh_app( data_dir = os.path.join(output, "data") os.mkdir(data_dir) - md = diversity_handler.metadata.copy() + md = data_handler.metadata.copy() md_loc = os.path.join(data_dir, "metadata.tsv") md.to_csv(md_loc, sep="\t", index=True) - data = diversity_handler.data - if isinstance(diversity_handler, AlphaDiversityHandler): - data_loc = os.path.join(data_dir, "diversity.alpha.tsv") + data = data_handler.data + if isinstance(data_handler, UnivariateDataHandler): + data_loc = os.path.join(data_dir, "data.univariate.tsv") data.to_csv(data_loc, sep="\t", index=True) - elif isinstance(diversity_handler, BetaDiversityHandler): - data_loc = os.path.join(data_dir, "diversity.beta.lsmat") + elif isinstance(data_handler, MultivariateDataHandler): + data_loc = os.path.join(data_dir, "data.multivariate.lsmat") data.write(data_loc) else: raise ValueError("No valid data found!") diff --git a/evident/q2/_methods.py b/evident/q2/_methods.py index 2644688..2460e9f 100644 --- a/evident/q2/_methods.py +++ b/evident/q2/_methods.py @@ -1,20 +1,27 @@ from typing import List import pandas as pd -from qiime2 import CategoricalMetadataColumn, Metadata +from qiime2 import Metadata from skbio import DistanceMatrix -from evident import AlphaDiversityHandler, BetaDiversityHandler -from evident.diversity_handler import ( - RepeatedMeasuresAlphaDiversityHandler as RDH -) +from evident import UnivariateDataHandler, MultivariateDataHandler +from evident.data_handler import RepeatedMeasuresUnivariateDataHandler as RDH from evident.effect_size import (effect_size_by_category, pairwise_effect_size_by_category) -def alpha_power_analysis( - alpha_diversity: pd.Series, - sample_metadata: CategoricalMetadataColumn, +def _check_provided_univariate_data(sample_metadata, data_column): + """Check if provided univariate data is valid.""" + if data_column not in sample_metadata.columns: + raise ValueError(f"{data_column} not found in sample metadata.") + if data_column not in sample_metadata.select_dtypes("number"): + raise ValueError("Values in data_column must be numeric.") + + +def univariate_power_analysis( + sample_metadata: Metadata, + group_column: str, + data_column: str, max_levels_per_category: int = 5, min_count_per_level: int = 3, alpha: list = None, @@ -22,8 +29,13 @@ def alpha_power_analysis( total_observations: list = None, difference: list = None, ) -> pd.DataFrame: - res = _power_analysis(alpha_diversity, sample_metadata, - AlphaDiversityHandler, + sample_metadata = sample_metadata.to_dataframe() + _check_provided_univariate_data(sample_metadata, data_column) + data = sample_metadata[data_column] + sample_metadata = sample_metadata.drop(columns=[data_column]) + + res = _power_analysis(data, sample_metadata, group_column, + UnivariateDataHandler, max_levels_per_category, min_count_per_level, alpha=alpha, power=power, total_observations=total_observations, @@ -31,9 +43,10 @@ def alpha_power_analysis( return res -def beta_power_analysis( - beta_diversity: DistanceMatrix, - sample_metadata: CategoricalMetadataColumn, +def multivariate_power_analysis( + data: DistanceMatrix, + sample_metadata: Metadata, + group_column: str, max_levels_per_category: int = 5, min_count_per_level: int = 3, alpha: list = None, @@ -41,8 +54,9 @@ def beta_power_analysis( total_observations: list = None, difference: list = None, ) -> pd.DataFrame: - res = _power_analysis(beta_diversity, sample_metadata, - BetaDiversityHandler, + sample_metadata = sample_metadata.to_dataframe() + res = _power_analysis(data, sample_metadata, group_column, + MultivariateDataHandler, max_levels_per_category, min_count_per_level, alpha=alpha, power=power, total_observations=total_observations, @@ -50,44 +64,48 @@ def beta_power_analysis( return res -def _power_analysis(data, metadata, handler, max_levels_per_category, - min_count_per_level, **kwargs): - md = metadata.to_series() - column = md.name - dh = handler(data, md.to_frame(), max_levels_per_category, +def _power_analysis(data, metadata, group_column, handler, + max_levels_per_category, min_count_per_level, **kwargs): + dh = handler(data, metadata, max_levels_per_category, min_count_per_level) - res = dh.power_analysis(column, **kwargs) + res = dh.power_analysis(group_column, **kwargs) return res.to_dataframe() -def alpha_effect_size_by_category( - alpha_diversity: pd.Series, +def univariate_effect_size_by_category( sample_metadata: Metadata, - columns: List[str], + group_columns: List[str], + data_column: str, pairwise: bool = False, n_jobs: int = None, max_levels_per_category: int = 5, min_count_per_level: int = 3 ) -> pd.DataFrame: - res = _effect_size_by_category(alpha_diversity, sample_metadata, - AlphaDiversityHandler, columns, pairwise, - n_jobs, max_levels_per_category, + sample_metadata = sample_metadata.to_dataframe() + _check_provided_univariate_data(sample_metadata, data_column) + data = sample_metadata[data_column] + sample_metadata = sample_metadata.drop(columns=[data_column]) + + res = _effect_size_by_category(data, sample_metadata, + UnivariateDataHandler, group_columns, + pairwise, n_jobs, max_levels_per_category, min_count_per_level) return res -def beta_effect_size_by_category( - beta_diversity: DistanceMatrix, +def multivariate_effect_size_by_category( + data: DistanceMatrix, sample_metadata: Metadata, - columns: List[str], + group_columns: List[str], pairwise: bool = False, n_jobs: int = None, max_levels_per_category: int = 5, min_count_per_level: int = 3 ) -> pd.DataFrame: - res = _effect_size_by_category(beta_diversity, sample_metadata, - BetaDiversityHandler, columns, pairwise, - n_jobs, max_levels_per_category, + sample_metadata = sample_metadata.to_dataframe() + res = _effect_size_by_category(data, sample_metadata, + MultivariateDataHandler, group_columns, + pairwise, n_jobs, max_levels_per_category, min_count_per_level) return res @@ -95,7 +113,7 @@ def beta_effect_size_by_category( def _effect_size_by_category(data, metadata, handler, columns, pairwise, n_jobs, max_levels_per_category, min_count_per_level): - dh = handler(data, metadata.to_dataframe(), max_levels_per_category, + dh = handler(data, metadata, max_levels_per_category, min_count_per_level) if pairwise: res = pairwise_effect_size_by_category(dh, columns, n_jobs=n_jobs) @@ -104,11 +122,11 @@ def _effect_size_by_category(data, metadata, handler, columns, pairwise, return res.to_dataframe() -def alpha_power_analysis_repeated_measures( - alpha_diversity: pd.Series, +def univariate_power_analysis_repeated_measures( sample_metadata: Metadata, individual_id_column: str, state_column: str, + data_column: str, subjects: list = None, measurements: list = None, alpha: list = None, @@ -117,7 +135,12 @@ def alpha_power_analysis_repeated_measures( max_levels_per_category: int = 5, min_count_per_level: int = 3, ) -> pd.DataFrame: - dh = RDH(alpha_diversity, sample_metadata.to_dataframe(), + sample_metadata = sample_metadata.to_dataframe() + _check_provided_univariate_data(sample_metadata, data_column) + data = sample_metadata[data_column] + sample_metadata = sample_metadata.drop(columns=[data_column]) + + dh = RDH(data, sample_metadata, individual_id_column, max_levels_per_category, min_count_per_level) diff --git a/evident/q2/plugin_setup.py b/evident/q2/plugin_setup.py index f55837c..6db135d 100644 --- a/evident/q2/plugin_setup.py +++ b/evident/q2/plugin_setup.py @@ -1,19 +1,17 @@ import importlib -from qiime2.plugin import (Plugin, MetadataColumn, Categorical, Int, Float, - List, Range, Choices, Str, Citations, Bool, - Metadata) -from q2_types.sample_data import SampleData, AlphaDiversity +from qiime2.plugin import (Plugin, Int, Float, List, Range, Choices, Str, + Citations, Bool, Metadata) from q2_types.distance_matrix import DistanceMatrix from evident import __version__ from ._format import PowerAnalysisResultsDirectoryFormat as PARsDirFmt from ._format import EffectSizeResultsDirectoryFormat as ERsDirFmt from ._type import PowerAnalysisResults, EffectSizeResults -from ._methods import (alpha_power_analysis, beta_power_analysis, - alpha_effect_size_by_category, - beta_effect_size_by_category, - alpha_power_analysis_repeated_measures) +from ._methods import (univariate_power_analysis, multivariate_power_analysis, + univariate_effect_size_by_category, + multivariate_effect_size_by_category, + univariate_power_analysis_repeated_measures) from ._visualizers import plot_power_curve, visualize_results @@ -21,7 +19,8 @@ Correlation = Float % Range(-1, 1, inclusive_end=True) PA_PARAM_DESCS = { - "sample_metadata": "Categorical sample metadata column.", + "group_column": "Column to use for groupings.", + "sample_metadata": "Sample metadata.", "alpha": "Significance level.", "power": ( "Probability of rejecting the null hypothesis given that the " @@ -52,7 +51,7 @@ ES_PARAM_DESCS = { "sample_metadata": "Sample metadata.", - "columns": "List of columns for which to calculate effect size.", + "group_columns": "List of columns for which to calculate effect size.", "pairwise": ( "Whether to calculate pairwise effect sizes within groups " "with more than 2 levels. If true, computes Cohen's d for all " @@ -81,9 +80,9 @@ version=__version__, website="https://github.com/biocore/evident", citations=[citations["Casals-Pascual2020"]], - short_description="Plugin for diversity effect size calculations", + short_description="Plugin for effect size calculations", description=( - "Perform power analysis on microbiome diversity data. Supports " + "Perform power analysis on microbiome data. Supports " "calculation of effect size given metadata covariates and supporting " "visualizations." ), @@ -91,12 +90,16 @@ ) +UNIV_PA_PARAM_DESCS = PA_PARAM_DESCS.copy() +UNIV_PA_PARAM_DESCS["data_column"] = "Column in metadata containing data." + plugin.methods.register_function( - function=alpha_power_analysis, - inputs={"alpha_diversity": SampleData[AlphaDiversity]}, - input_descriptions={"alpha_diversity": "Alpha diversity vector"}, + function=univariate_power_analysis, + inputs={}, parameters={ - "sample_metadata": MetadataColumn[Categorical], + "group_column": Str, + "data_column": Str, + "sample_metadata": Metadata, "alpha": List[Probability], "power": List[Probability], "total_observations": List[Int], @@ -104,22 +107,23 @@ "max_levels_per_category": Int, "min_count_per_level": Int }, - parameter_descriptions=PA_PARAM_DESCS, + parameter_descriptions=UNIV_PA_PARAM_DESCS, outputs=[("power_analysis_results", PowerAnalysisResults)], - name="Alpha diversity power analysis.", + name="Univariate data power analysis.", description=( - "Use sample alpha diversity data to perform power calculations " + "Use sample univariate data to perform power calculations " "for desired significance level, power, or sample size. Exactly one " "of alpha, power, or sample size must be excluded." ) ) plugin.methods.register_function( - function=beta_power_analysis, - inputs={"beta_diversity": DistanceMatrix}, - input_descriptions={"beta_diversity": "Beta diversity distance matrix"}, + function=multivariate_power_analysis, + inputs={"data": DistanceMatrix}, + input_descriptions={"data": "Sample distance matrix"}, parameters={ - "sample_metadata": MetadataColumn[Categorical], + "group_column": Str, + "sample_metadata": Metadata, "alpha": List[Probability], "power": List[Probability], "total_observations": List[Int], @@ -129,16 +133,17 @@ }, parameter_descriptions=PA_PARAM_DESCS, outputs=[("power_analysis_results", PowerAnalysisResults)], - name="Beta diversity power analysis.", + name="Multivariate data power analysis.", description=( - "Use sample beta diversity data to perform power calculations " + "Use sample Multivariate data data to perform power calculations " "for desired significance level, power, or sample size." ) ) rm_param_descs = { k: v for k, v in PA_PARAM_DESCS.items() - if k not in ["total_observations", "difference", "power"] + if k not in ["total_observations", "difference", "power", + "group_column"] } rm_param_descs["individual_id_column"] = ( "Metadata column containing IDs for individual subjects." @@ -150,14 +155,16 @@ rm_param_descs["measurements"] = "Number of measurements per subject." rm_param_descs["correlation"] = "Correlation between repeated measurements." rm_param_descs["epsilon"] = "Sphericity parameter." +rm_param_descs["data_column"] = "Column in metadata containing data." plugin.methods.register_function( - function=alpha_power_analysis_repeated_measures, - inputs={"alpha_diversity": SampleData[AlphaDiversity]}, - input_descriptions={"alpha_diversity": "Alpha diversity vector"}, + function=univariate_power_analysis_repeated_measures, + inputs={}, + input_descriptions={}, parameters={ "sample_metadata": Metadata, "individual_id_column": Str, + "data_column": Str, "state_column": Str, "subjects": List[Int], "measurements": List[Int], @@ -169,20 +176,20 @@ }, parameter_descriptions=rm_param_descs, outputs=[("power_analysis_results", PowerAnalysisResults)], - name="Alpha diversity power analysis for repeated measures.", + name="Univariate data power analysis for repeated measures.", description=( - "Use sample alpha diversity data to perform power calculations " + "Use sample univariate data to perform power calculations " "for repeated measures." ) ) plugin.methods.register_function( - function=alpha_effect_size_by_category, - inputs={"alpha_diversity": SampleData[AlphaDiversity]}, - input_descriptions={"alpha_diversity": "Alpha diversity vector"}, + function=univariate_effect_size_by_category, + inputs={}, parameters={ "sample_metadata": Metadata, - "columns": List[Str], + "data_column": Str, + "group_columns": List[Str], "pairwise": Bool, "n_jobs": Int, "max_levels_per_category": Int, @@ -190,20 +197,20 @@ }, parameter_descriptions=ES_PARAM_DESCS, outputs=[("effect_size_results", EffectSizeResults)], - name="Alpha diversity effect size by category.", + name="Univariate data effect size by category.", description=( - "Calculate alpha diversity difference effect size of multiple " + "Calculate univariate data difference effect size of multiple " "categories." ) ) plugin.methods.register_function( - function=beta_effect_size_by_category, - inputs={"beta_diversity": DistanceMatrix}, - input_descriptions={"beta_diversity": "Beta diversity distance matrix"}, + function=multivariate_effect_size_by_category, + inputs={"data": DistanceMatrix}, + input_descriptions={"data": "Multivariate data distance matrix"}, parameters={ "sample_metadata": Metadata, - "columns": List[Str], + "group_columns": List[Str], "pairwise": Bool, "n_jobs": Int, "max_levels_per_category": Int, @@ -211,9 +218,9 @@ }, parameter_descriptions=ES_PARAM_DESCS, outputs=[("effect_size_results", EffectSizeResults)], - name="Beta diversity effect size by category.", + name="Multivariate data effect size by category.", description=( - "Calculate beta diversity difference effect size of multiple " + "Calculate multivariate data difference effect size of multiple " "categories." ) ) diff --git a/evident/q2/tests/test_plugin.py b/evident/q2/tests/test_plugin.py index 55c2e7b..fbd36b5 100644 --- a/evident/q2/tests/test_plugin.py +++ b/evident/q2/tests/test_plugin.py @@ -26,77 +26,90 @@ def metadata(): @pytest.fixture(scope="module") -def es_results(alpha_artifact, metadata): - res = evident.methods.alpha_effect_size_by_category( - alpha_diversity=alpha_artifact, - sample_metadata=metadata, - columns=["classification", "cd_behavior"] +def metadata_w_data(): + fname = os.path.join(os.path.dirname(__file__), "data/test_metadata.qza") + metadata = Metadata.load(fname).to_dataframe() + fname = os.path.join(os.path.dirname(__file__), "data/test_alpha_div.qza") + metadata["alpha_div"] = Artifact.load(fname).view(pd.Series) + return Metadata(metadata) + + +@pytest.fixture(scope="module") +def es_results(alpha_artifact, metadata_w_data): + res = evident.methods.univariate_effect_size_by_category( + sample_metadata=metadata_w_data, + data_column="alpha_div", + group_columns=["classification", "cd_behavior"] ).effect_size_results return res @pytest.fixture(scope="module") -def pa_results(alpha_artifact, metadata): - res = evident.methods.alpha_power_analysis( - alpha_diversity=alpha_artifact, - sample_metadata=metadata.get_column("cd_behavior"), +def pa_results(alpha_artifact, metadata_w_data): + res = evident.methods.univariate_power_analysis( + sample_metadata=metadata_w_data, + data_column="alpha_div", + group_column="classification", alpha=[0.01, 0.05, 0.1], total_observations=[20, 30, 40] ).power_analysis_results return res -def test_alpha_pa(alpha_artifact, metadata): - evident.methods.alpha_power_analysis( - alpha_artifact, - metadata.get_column("classification"), - alpha=[0.05], +def test_alpha_pa(alpha_artifact, metadata_w_data): + evident.methods.univariate_power_analysis( + sample_metadata=metadata_w_data, + data_column="alpha_div", + group_column="classification", + alpha=[0.01, 0.05], power=[0.8] - ) + ).power_analysis_results.view(pd.DataFrame) def test_beta_pa(beta_artifact, metadata): - evident.methods.beta_power_analysis( - beta_artifact, - metadata.get_column("classification"), + evident.methods.multivariate_power_analysis( + data=beta_artifact, + sample_metadata=metadata, + group_column="classification", alpha=[0.05], power=[0.8] ) -def test_alpha_effect_size_by_cat(alpha_artifact, metadata): - non_pairwise = evident.methods.alpha_effect_size_by_category( - alpha_diversity=alpha_artifact, - sample_metadata=metadata, - columns=["classification", "cd_behavior"] +def test_univariate_effect_size_by_cat(alpha_artifact, metadata_w_data): + exp_cols = ["effect_size", "metric", "column"] + non_pairwise = evident.methods.univariate_effect_size_by_category( + sample_metadata=metadata_w_data, + data_column="alpha_div", + group_columns=["classification", "cd_behavior"] ).effect_size_results.view(pd.DataFrame) assert non_pairwise.shape == (2, 3) - assert (non_pairwise.columns == ["effect_size", "metric", "column"]).all() + assert (non_pairwise.columns == exp_cols).all() - pairwise = evident.methods.alpha_effect_size_by_category( - alpha_diversity=alpha_artifact, - sample_metadata=metadata, - columns=["classification", "cd_behavior"], + exp_cols = ["effect_size", "metric", "column", "group_1", "group_2"] + pairwise = evident.methods.univariate_effect_size_by_category( + sample_metadata=metadata_w_data, + data_column="alpha_div", + group_columns=["classification", "cd_behavior"], pairwise=True ).effect_size_results.view(pd.DataFrame) assert pairwise.shape == (4, 5) - exp_cols = ["effect_size", "metric", "column", "group_1", "group_2"] assert (pairwise.columns == exp_cols).all() -def test_beta_effect_size_by_cat(beta_artifact, metadata): - non_pairwise = evident.methods.beta_effect_size_by_category( - beta_diversity=beta_artifact, +def test_multivariate_effect_size_by_cat(beta_artifact, metadata): + non_pairwise = evident.methods.multivariate_effect_size_by_category( + data=beta_artifact, sample_metadata=metadata, - columns=["classification", "cd_behavior"] + group_columns=["classification", "cd_behavior"] ).effect_size_results.view(pd.DataFrame) assert non_pairwise.shape == (2, 3) assert (non_pairwise.columns == ["effect_size", "metric", "column"]).all() - pairwise = evident.methods.beta_effect_size_by_category( - beta_diversity=beta_artifact, + pairwise = evident.methods.multivariate_effect_size_by_category( + data=beta_artifact, sample_metadata=metadata, - columns=["classification", "cd_behavior"], + group_columns=["classification", "cd_behavior"], pairwise=True ).effect_size_results.view(pd.DataFrame) assert pairwise.shape == (4, 5) @@ -104,20 +117,21 @@ def test_beta_effect_size_by_cat(beta_artifact, metadata): assert (pairwise.columns == exp_cols).all() -def test_alpha_effect_size_by_cat_parallel(alpha_artifact, metadata): - non_pairwise = evident.methods.alpha_effect_size_by_category( - alpha_diversity=alpha_artifact, - sample_metadata=metadata, - columns=["classification", "cd_behavior"], +def test_univariate_effect_size_by_cat_parallel(alpha_artifact, + metadata_w_data): + non_pairwise = evident.methods.univariate_effect_size_by_category( + sample_metadata=metadata_w_data, + data_column="alpha_div", + group_columns=["classification", "cd_behavior"], n_jobs=2 ).effect_size_results.view(pd.DataFrame) assert non_pairwise.shape == (2, 3) assert (non_pairwise.columns == ["effect_size", "metric", "column"]).all() - pairwise = evident.methods.alpha_effect_size_by_category( - alpha_diversity=alpha_artifact, - sample_metadata=metadata, - columns=["classification", "cd_behavior"], + pairwise = evident.methods.univariate_effect_size_by_category( + sample_metadata=metadata_w_data, + data_column="alpha_div", + group_columns=["classification", "cd_behavior"], pairwise=True, n_jobs=2 ).effect_size_results.view(pd.DataFrame) @@ -126,20 +140,20 @@ def test_alpha_effect_size_by_cat_parallel(alpha_artifact, metadata): assert (pairwise.columns == exp_cols).all() -def test_beta_effect_size_by_cat_parallel(beta_artifact, metadata): - non_pairwise = evident.methods.beta_effect_size_by_category( - beta_diversity=beta_artifact, +def test_multivariate_effect_size_by_cat_parallel(beta_artifact, metadata): + non_pairwise = evident.methods.multivariate_effect_size_by_category( + data=beta_artifact, sample_metadata=metadata, - columns=["classification", "cd_behavior"], + group_columns=["classification", "cd_behavior"], n_jobs=2 ).effect_size_results.view(pd.DataFrame) assert non_pairwise.shape == (2, 3) assert (non_pairwise.columns == ["effect_size", "metric", "column"]).all() - pairwise = evident.methods.beta_effect_size_by_category( - beta_diversity=beta_artifact, + pairwise = evident.methods.multivariate_effect_size_by_category( + data=beta_artifact, sample_metadata=metadata, - columns=["classification", "cd_behavior"], + group_columns=["classification", "cd_behavior"], pairwise=True, n_jobs=2 ).effect_size_results.view(pd.DataFrame) @@ -194,11 +208,6 @@ def test_alpha_pa_repeated(): ) long_data["bad_col"] = bad_col - metadata = Metadata(long_data.drop(columns=["diversity"])) - alpha_div = Artifact.import_data( - "SampleData[AlphaDiversity]", long_data["diversity"], - ) - exp_power_dict = { (2, -0.5): 0.11387136147153543, (2, 0): 0.13648071042423104, @@ -211,9 +220,14 @@ def test_alpha_pa_repeated(): (5, 0.5): 0.9517932434077899 } - results, = evident.methods.alpha_power_analysis_repeated_measures( - alpha_diversity=alpha_div, - sample_metadata=metadata, + exp_cols = { + "alpha", "total_observations", "power", "effect_size", "subjects", + "measurements", "epsilon", "correlation", "total_observations", + "metric", "column" + } + results, = evident.methods.univariate_power_analysis_repeated_measures( + sample_metadata=Metadata(long_data), + data_column="diversity", individual_id_column="subject", state_column="group", subjects=[2, 4, 5], @@ -222,12 +236,10 @@ def test_alpha_pa_repeated(): epsilon=[0.1], alpha=[0.05] ) + results_df = results.view(pd.DataFrame) - assert set(results_df.columns) == { - "alpha", "total_observations", "power", "effect_size", "subjects", - "measurements", "epsilon", "correlation", "total_observations", - "metric", "column" - } + assert set(results_df.columns) == exp_cols + for i, row in results_df.iterrows(): key = row["subjects"], row["correlation"] np.testing.assert_almost_equal( @@ -236,3 +248,27 @@ def test_alpha_pa_repeated(): decimal=5 ) evident.visualizers.visualize_results(results=results) + + +def test_univariate_bad_data(alpha_artifact, metadata, metadata_w_data): + with pytest.raises(ValueError) as exc_info: + evident.methods.univariate_power_analysis( + sample_metadata=metadata, + data_column="greninja", + group_column="classification", + alpha=[0.01, 0.05], + power=[0.8] + ) + exp_err_msg = "greninja not found in sample metadata." + assert str(exc_info.value) == exp_err_msg + + with pytest.raises(ValueError) as exc_info: + evident.methods.univariate_power_analysis( + sample_metadata=metadata_w_data, + data_column="ibd_subtype", + group_column="classification", + alpha=[0.01, 0.05], + power=[0.8] + ) + exp_err_msg = "Values in data_column must be numeric." + assert str(exc_info.value) == exp_err_msg diff --git a/evident/support_files/main.py b/evident/support_files/main.py index e59368c..7347dba 100644 --- a/evident/support_files/main.py +++ b/evident/support_files/main.py @@ -11,7 +11,7 @@ import seaborn as sns from skbio import DistanceMatrix -from evident import AlphaDiversityHandler, BetaDiversityHandler +from evident import UnivariateDataHandler, MultivariateDataHandler from evident.effect_size import effect_size_by_category curr_path = os.path.dirname(__file__) @@ -25,19 +25,19 @@ multiclass_cols = [col for col in cols if col not in binary_cols] data_path = os.path.join(curr_path, "data") -data_loc = glob.glob(f"{data_path}/diversity*")[0] - -if "alpha" in data_loc: - alpha_div_data = pd.read_table(data_loc, sep="\t", index_col=0) - # Loads as DataFrame. Need to squeeze to Series for ADH. - dh = AlphaDiversityHandler(alpha_div_data.squeeze(), md) - div_type = "Alpha" - ylabel = "Alpha Diversity" -elif "beta" in data_loc: - beta_div_data = DistanceMatrix.read(data_loc) - dh = BetaDiversityHandler(beta_div_data, md) - div_type = "Beta" - ylabel = "Within-Group Distances" +data_loc = glob.glob(f"{data_path}/data*")[0] + +if "univariate" in data_loc: + univariate_data = pd.read_table(data_loc, sep="\t", index_col=0) + # Loads as DataFrame. Need to squeeze to Series for UDH. + dh = UnivariateDataHandler(univariate_data.squeeze(), md) + data_type = "Univariate" + data_name = univariate_data.squeeze().name +elif "multivariate" in data_loc: + multivariate_data = DistanceMatrix.read(data_loc) + dh = MultivariateDataHandler(multivariate_data, md) + data_type = "Multivariate" + data_name = "Within-Group Distances" else: raise ValueError("No valid data found!") @@ -254,9 +254,9 @@ def outliers(grp): line_color="black") boxes.ygrid.grid_line_color = None - boxes.xaxis.axis_label = ylabel + boxes.xaxis.axis_label = data_name boxes.title.text = ( - f"{div_type} Diversity - {chosen_box_col.value}\n" + f"{data_type} Data - {chosen_box_col.value}\n" f"{metric} = {effect_size:.3f}" ) boxes.title.text_font_size = "10pt" diff --git a/evident/tests/conftest.py b/evident/tests/conftest.py index 5022db8..9c43c6b 100644 --- a/evident/tests/conftest.py +++ b/evident/tests/conftest.py @@ -4,8 +4,8 @@ import pytest from skbio import DistanceMatrix -from evident.diversity_handler import (AlphaDiversityHandler, - BetaDiversityHandler) +from evident.data_handler import (UnivariateDataHandler, + MultivariateDataHandler) NA_VALS = ["missing: not provided", "not applicable"] @@ -15,7 +15,7 @@ def alpha_mock(): fname = os.path.join(os.path.dirname(__file__), "data/metadata.tsv") df = pd.read_table(fname, sep="\t", index_col=0, na_values=NA_VALS) - adh = AlphaDiversityHandler(df["faith_pd"], df) + adh = UnivariateDataHandler(df["faith_pd"], df) return adh @@ -26,5 +26,5 @@ def beta_mock(): dm_file = os.path.join(os.path.dirname(__file__), "data/distance_matrix.lsmat.gz") dm = DistanceMatrix.read(dm_file) - bdh = BetaDiversityHandler(dm, df) + bdh = MultivariateDataHandler(dm, df) return bdh diff --git a/evident/tests/make_interactive.py b/evident/tests/make_interactive.py index fe41a07..47b31fa 100644 --- a/evident/tests/make_interactive.py +++ b/evident/tests/make_interactive.py @@ -4,7 +4,7 @@ import pandas as pd from skbio import DistanceMatrix -from evident import AlphaDiversityHandler, BetaDiversityHandler +from evident import UnivariateDataHandler, MultivariateDataHandler from evident.interactive import create_bokeh_app @@ -20,11 +20,11 @@ def interactive(output, diversity_type): df = df.dropna(axis=0) if diversity_type == "alpha": - dh = AlphaDiversityHandler(df["faith_pd"], df) + dh = UnivariateDataHandler(df["faith_pd"], df) elif diversity_type == "beta": dm_loc = os.path.join(curr_path, "data/distance_matrix.lsmat.gz") dm = DistanceMatrix.read(dm_loc) - dh = BetaDiversityHandler(dm, df) + dh = MultivariateDataHandler(dm, df) else: raise ValueError("No valid data!") diff --git a/evident/tests/test_diversity_handler.py b/evident/tests/test_data_handler.py similarity index 93% rename from evident/tests/test_diversity_handler.py rename to evident/tests/test_data_handler.py index e561e19..afa90ff 100644 --- a/evident/tests/test_diversity_handler.py +++ b/evident/tests/test_data_handler.py @@ -5,8 +5,8 @@ import pytest from skbio import DistanceMatrix -from evident.diversity_handler import (AlphaDiversityHandler, - BetaDiversityHandler) +from evident.data_handler import (UnivariateDataHandler, + MultivariateDataHandler) import evident._exceptions as exc na_values = ["not applicable"] @@ -20,7 +20,7 @@ def test_init_alpha_div_handler(self): "cd_behavior", "cd_location", "cd_resection", "ibd_subtype", "perianal_disease", "sex", "classification" ] - a = AlphaDiversityHandler(df["faith_pd"], df) + a = UnivariateDataHandler(df["faith_pd"], df) assert a.metadata.shape == (220, len(exp_cols)) assert a.data.shape == (220, ) @@ -40,7 +40,7 @@ def test_alpha_wrong_data(self, alpha_mock): data = alpha_mock.data.to_frame() with pytest.raises(ValueError) as exc_info: - AlphaDiversityHandler(data, alpha_mock.metadata) + UnivariateDataHandler(data, alpha_mock.metadata) exp_err_msg = "data must be of type pandas.Series" assert str(exc_info.value) == exp_err_msg @@ -50,7 +50,7 @@ def test_alpha_data_nan(self, alpha_mock): data[0] = np.nan data[-1] = np.nan with pytest.warns(UserWarning) as warn_info: - AlphaDiversityHandler(data, alpha_mock.metadata) + UnivariateDataHandler(data, alpha_mock.metadata) warn_msg_1 = warn_info[0].message.args[0] warn_msg_2 = warn_info[1].message.args[0] @@ -75,7 +75,7 @@ def test_init_beta_div_handler(self): dm_file = os.path.join(os.path.dirname(__file__), "data/distance_matrix.lsmat.gz") dm = DistanceMatrix.read(dm_file) - b = BetaDiversityHandler(dm, df) + b = MultivariateDataHandler(dm, df) assert b.metadata.shape == (220, len(exp_cols)) assert b.data.shape == (220, 220) @@ -95,7 +95,7 @@ def test_beta_wrong_data(self, beta_mock): data = beta_mock.data.to_data_frame() with pytest.raises(ValueError) as exc_info: - BetaDiversityHandler(data, beta_mock.metadata) + MultivariateDataHandler(data, beta_mock.metadata) exp_err_msg = "data must be of type skbio.DistanceMatrix" assert str(exc_info.value) == exp_err_msg @@ -170,7 +170,7 @@ def test_alpha_power_err_no_args(self, alpha_mock): assert str(exc_info.value) == exp_err_msg def test_alpha_power_f(self, alpha_mock, monkeypatch): - # Monkey patch Cohen's f calculation directly in diversity_handler + # Monkey patch Cohen's f calculation directly in data_handler # instead of in _utils. Doesn't really make sense that it has # to be done this way but whatever. # https://stackoverflow.com/a/45466846 @@ -178,7 +178,7 @@ def mock_cohens_f(*args): return 0.4 monkeypatch.setattr( - "evident.diversity_handler.calculate_cohens_f", + "evident.data_handler.calculate_cohens_f", mock_cohens_f ) calc_power = alpha_mock.power_analysis( diff --git a/evident/tests/test_effect_size.py b/evident/tests/test_effect_size.py index 1cc592c..31ba659 100644 --- a/evident/tests/test_effect_size.py +++ b/evident/tests/test_effect_size.py @@ -2,7 +2,7 @@ import pandas as pd import pytest -from evident import AlphaDiversityHandler +from evident import UnivariateDataHandler from evident import effect_size as expl @@ -133,7 +133,7 @@ def test_nan_in_cols(): faith_vals = pd.Series([1, 3, 4, 5, 6, 6]) faith_vals.index = df.index - adh = AlphaDiversityHandler(faith_vals, df, min_count_per_level=1) + adh = UnivariateDataHandler(faith_vals, df, min_count_per_level=1) assert not np.isnan(adh.calculate_effect_size("col1").effect_size) assert not np.isnan(adh.calculate_effect_size("col2").effect_size) @@ -147,7 +147,7 @@ def test_nan_in_cols_one_one_cat(): faith_vals = pd.Series([1, 3, 4, 5, 6, 6]) faith_vals.index = df.index - adh = AlphaDiversityHandler(faith_vals, df, min_count_per_level=1) + adh = UnivariateDataHandler(faith_vals, df, min_count_per_level=1) assert not np.isnan(adh.calculate_effect_size("col1").effect_size) with pytest.raises(KeyError): diff --git a/evident/tests/test_interactive.py b/evident/tests/test_interactive.py index 3ee9abe..2f17734 100644 --- a/evident/tests/test_interactive.py +++ b/evident/tests/test_interactive.py @@ -33,9 +33,9 @@ def test_interactive(mock, request, tmpdir): "data/metadata.tsv" } if mock == "alpha_mock": - exp_files.add("data/diversity.alpha.tsv") + exp_files.add("data/data.univariate.tsv") else: - exp_files.add("data/diversity.beta.lsmat") + exp_files.add("data/data.multivariate.lsmat") assert files == exp_files md = pd.read_table(os.path.join(outdir, "data/metadata.tsv"), sep="\t", diff --git a/evident/tests/test_rm_anova.py b/evident/tests/test_rm_anova.py index 37020f1..a4e62db 100644 --- a/evident/tests/test_rm_anova.py +++ b/evident/tests/test_rm_anova.py @@ -2,7 +2,7 @@ import pandas as pd import pytest -from evident.diversity_handler import RepeatedMeasuresAlphaDiversityHandler +from evident.data_handler import RepeatedMeasuresUnivariateDataHandler from evident.stats import calculate_eta_squared, calculate_rm_anova_power @@ -21,7 +21,7 @@ def rm_alpha_mock(): pd.DataFrame.from_dict(data_dict) .reset_index() .rename(columns={"index": "group"}) - .melt(id_vars="group", var_name="subject", value_name="diversity") + .melt(id_vars="group", var_name="subject", value_name="values") ) rng = np.random.default_rng() @@ -34,9 +34,9 @@ def rm_alpha_mock(): p=[0.25, 0.25, 0.5] ) long_data["bad_col"] = bad_col - rmadh = RepeatedMeasuresAlphaDiversityHandler( - data=long_data["diversity"], - metadata=long_data.drop(columns=["diversity"]), + rmadh = RepeatedMeasuresUnivariateDataHandler( + data=long_data["values"], + metadata=long_data.drop(columns=["values"]), individual_id_column="subject", ) return rmadh @@ -47,7 +47,7 @@ def test_calc_eta_squared(rm_alpha_mock): pd.concat([rm_alpha_mock.metadata, rm_alpha_mock.data], axis=1), index="subject", columns="group", - values="diversity" + values="values" ) calc_eta_sq = calculate_eta_squared(wide_data) np.testing.assert_almost_equal(calc_eta_sq, 0.715, decimal=3) diff --git a/evident/tests/test_utils.py b/evident/tests/test_utils.py index 593a1cd..c697591 100644 --- a/evident/tests/test_utils.py +++ b/evident/tests/test_utils.py @@ -3,7 +3,7 @@ import pytest from evident import utils -from evident.diversity_handler import AlphaDiversityHandler +from evident.data_handler import UnivariateDataHandler def test_listify(): @@ -32,7 +32,7 @@ def test_check_sample_overlap(): index=[f"S{i+2}" for i in range(5)] ) with pytest.warns(UserWarning) as warn_info: - AlphaDiversityHandler(alpha_div, md) + UnivariateDataHandler(alpha_div, md) exp_msg = ( "Data and metadata do not have the same sample IDs. Using 4 samples " "common to both." diff --git a/setup.py b/setup.py index 41d3d09..7554d3f 100644 --- a/setup.py +++ b/setup.py @@ -30,7 +30,7 @@ """ classifiers = [s.strip() for s in classes.split("\n") if s] -description = "Effect size calculations for microbiome diversity data." +description = "Effect size calculations for microbiome data." setup( name="evident",