-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Document QIIME 2 metadata merging complications #393
Comments
Simpler example, involving merging a taxonomy.qza and feature_importance.qza file: from qiime2 import Artifact
import pandas as pd
fi = Artifact.load("feature_importance.qza").view(pd.DataFrame)
tax = Artifact.load("taxonomy.qza").view(pd.DataFrame)
merged_df = pd.concat([tax, fi], axis=1, sort=False)
# Assign index a name to allow us to use this as a Q2 feature metadata file
merged_df.index.name = "FeatureID"
# Missing values are, by default, represented as NaNs.
# .to_csv() represents them in the TSV as empty values by default (see the
# na_rep parameter:
# https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_csv.html)
merged_df.to_csv("merged_fm.tsv", sep="\t") After this, the |
When multiple sample* / feature metadata files are provided to Empress through QIIME 2, they're merged in such a way that only stuff shared across all metadata files is included. See here for details.
The problem with this is that this can rapidly reduce the amount of metadata passed to Empress -- the Q2 tutorial feature_importance.qza only contains 566 features, while the taxonomy.qza contains 770 features. This means that passing both in to Empress will "remove" taxonomy data for a lot of features, making taxonomy coloring look a lot more sparse.
Since it might be a while until there is built-in QIIME 2 support for other merging methods, in the interim we should ideally:
For task 2, here is a rough transcript of the code I used to merge the feature metadata files in this directory:
Should be decent enough.
* I think this might impact sample metadata files, but feature metadata files are more of a problem for this right now
The text was updated successfully, but these errors were encountered: