Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filter out discriminatory taxa in utils.py, add in README that QADABRA is WIP software, add install from source instructions in README #53

Merged
merged 7 commits into from
Jan 21, 2024
4 changes: 3 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,8 @@ Importantly, Qadabra focuses on both FDR corrected p-values *and* [feature ranks

![Schematic](images/Qadabra_schematic.svg)

Please note this software is currently a work in progress. Your patience is appreciated as we continue to develop and enhance its features. Please leave an issue on GitHub should you run into any errors.

## Installation
```
pip install qadabra
Expand Down Expand Up @@ -97,7 +99,7 @@ This will create a zipped directory containing the report.
Unzip this file and open the `report.html` file to view the report containing results and visualizations in your browser.

## Tutorial
See the [tutorial](tutorial.md) page for a walkthroughon using Qadabra workflow with a microbiome dataset.
See the [tutorial](tutorial.md) page for a walkthrough on using Qadabra workflow with a microbiome dataset.

## FAQs
Coming soon: An [FAQs](FAQs.md) page of commonly asked question on the statistics and code pertaining to Qadabra.
Expand Down
21 changes: 14 additions & 7 deletions qadabra/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

import biom
import pandas as pd

import warnings

def _validate_input(
logger: logging.Logger,
Expand Down Expand Up @@ -51,11 +51,18 @@ def _validate_input(
joint_df = tbl_df.join(md)
gb = joint_df.groupby(factor_name).sum(numeric_only=True)
feat_presence = gb.apply(lambda x: x.all())
if not feat_presence.all():
raise ValueError(
"Some taxa in the table perfectly discriminate factor groups. "
"Please filter out these taxa before running Qadabra."
)

discriminating_feats = feat_presence[~feat_presence].index.tolist()

if len(discriminating_feats) > 0:
warning_msg = "Some features in the table perfectly discriminate factor groups:\n" + '\n'.join(discriminating_feats) + ".\nAutomatically filtering out these features before running Qadabra..."
print("Number of discriminating features: " + str(len(discriminating_feats)))
gibsramen marked this conversation as resolved.
Show resolved Hide resolved
warnings.warn(warning_msg, category=Warning)

# Filtering out the discriminating features from the BIOM table
tbl = tbl.filter(lambda value, id_, metadata: id_ not in discriminating_feats, axis='observation', inplace=False)
logger.info(f"Table shape after filtering: {tbl.shape}")


if tree:
from bp import parse_newick, to_skbio_treenode
Expand All @@ -69,4 +76,4 @@ def _validate_input(
raise ValueError("Tree tips are not a subset of table features!")
else:
logger.info("Reading phylogenetic tree...")
logger.info("(Optional tree file not provided. Skipping tree validation.)")
logger.info("(Optional tree file not provided. Skipping tree validation.)")
Loading