Qadabra is a Snakemake workflow for running and comparing several differential abundance (DA) methods on the same microbiome dataset.
Importantly, Qadabra focuses on both FDR corrected p-values and feature ranks and generates visualizations of differential abundance results.
pip install qadabra
Qadabra requires the following dependencies:
- snakemake
- click
- biom-format
- pandas
- numpy
- cython
- iow
Qadabra can be used on multiple datasets at once. First, we want to create the workflow directory to perfrom differential abundance with all methods:
qadabra create-workflow --workflow-dest <directory_name>
This command will initialize the workflow, but we still need to point to our dataset(s) of interest.
We can add datasets one-by-one with the add-dataset
command:
qadabra add-dataset \
--workflow-dest <directory_name> \
--table <directory_name>/data/table.biom \
--metadata <directory_name>/data/metadata.tsv \
--tree <directory_name>/data/my_tree.nwk \
--name my_dataset \
--factor-name case_control \
--target-level case \
--reference-level control \
--confounder confounding_variable(s) <confounding_var> \
--verbose
Let's walkthrough the arguments provided here, which represent the inputs to Qadabra:
workflow-dest
: The location of the workflow that we created earliertable
: Feature table (features by samples) in BIOM formatmetadata
: Sample metadata in TSV formattree
: Phylogenetic tree in .nwk or other tree format (optional)name
: Name to give this datasetfactor-name
: Metadata column to use for differential abundancetarget-level
: The value in the chosen factor to use as the targetreference-level
: The reference level to which we want to compare our targetconfounder
: Any confounding variable metadata columns (optional)verbose
: Flag to show all preprocessing performed by Qadabra
Your dataset should now be added as a line in my_qadabra/config/datasets.tsv
.
You can use qadabra add-dataset --help
for more details.
To add another dataset, just run this command again with the new dataset information.
The previous commands will create a subdirectory, my_qadabra
in which the workflow structure is contained.
From the command line, execute the following to start the workflow:
snakemake --use-conda --cores <number of cores preferred> <other options>
Please read the Snakemake documentation for how to run Snakemake best on your system.
When this process is completed, you should have directories figures
, results
, and log
.
Each of these directories will have a separate folder for each dataset you added.
After Qadabra has finished running, you can generate a Snakemake report of the workflow with the following command:
snakemake --report report.zip
This will create a zipped directory containing the report.
Unzip this file and open the report.html
file to view the report containing results and visualizations in your browser.
See the tutorial page for a walkthroughon using Qadabra workflow with a microbiome dataset.
Coming soon: An FAQs page of commonly asked question on the statistics and code pertaining to Qadabra.
The manuscript for Qadabra is currently in progress. Please cite this GitHub page if Qadabra is used for your analysis. This project is licensed under the BSD-3 License. See the license file for details.