-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Match feature table and ordination #237
Conversation
This would otherwise lead to situations where mismatching categories would be displayed in the ordination and phylogeny. In principle users should use the rarefied table that was used for generating the ordination i.e. samples should have a 1:1 match. Fixes biocore#204
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ElDeveloper thanks, this looks good. @fedarko go ahead and merge this if you don't have any comments.
@ElDeveloper The code changes look fine to me (with one conceptual question which I'll get to below). As I understand it, this PR replaces the I have some concerns about this. For one, I don't think the rarefied table should be used for the sample coloring functionality for Empress -- unless I'm missing something, there isn't a "need" to rarefy the data analogous to how rarefaction is needed for ordinations. Using the rarefied table for coloring the tree will likely result in a decently different coloring than a non-rarefied table, and IMO we should try to preserve all of the data as much as possible. My contention is that it would be better to accept non-rarefied tables in this case (i.e. tables where the ordination might have a subset of the samples in the table, due to these samples being dropped out by rarefaction), and then alter the matching code to just subset the table's samples to those in the ordination (without altering the other samples' abundances). This should probably be accompanied by a warning about these samples being dropped due to not being in the ordination. (Of course, samples in the ordination but not the table should cause an error, as would ordinations where none of the samples are present in the table.) However, if you disagree with this, please let me know -- we can chat. Regardless of what we decide on as the "best practice," I think we should document this somewhere so that users are well-informed. If there isn't a clear best practice, then we should document that. (If we do decide that passing in rarefied tables is a good idea, then the part of the README with the " |
I can see arguments either way — rarefying will certainly affect the visualization, and in a stochastic way, but not rarefying will give too much visual weight to the samples with the largest # sequences, which may be undesirable depending on application. Can we make it a user-defined setting?
… On Jul 8, 2020, at 3:56 PM, Marcus Fedarko ***@***.***> wrote:
@ElDeveloper The code changes look fine to me (with one conceptual question which I'll get to below). As I understand it, this PR replaces the table.qza in the repository with the rarefied table from the moving pictures tutorial?
I have some concerns about this. For one, I don't think the rarefied table should be used for the sample coloring functionality for Empress -- unless I'm missing something, there isn't a "need" to rarefy the data analogous to how rarefaction is needed for ordinations. Using the rarefied table for coloring the tree will likely result in a decently different coloring than a non-rarefied table, and IMO we should try to preserve all of the data as much as possible.
My contention is that it would be better to accept non-rarefied tables in this case (i.e. tables where the ordination might have a subset of the samples in the table, due to these samples being dropped out by rarefaction), and then alter the matching code to just subset the table's samples to those in the ordination (without altering the other samples' abundances). This should probably be accompanied by a warning about these samples being dropped due to not being in the ordination. (Of course, samples in the ordination but not the table should cause an error, as would ordinations where none of the samples are present in the table.)
However, if you disagree with this, please let me know -- we can chat. Regardless of what we decide on as the "best practice," I think we should document this somewhere so that users are well-informed. If there isn't a clear best practice, then we should document that.
(If we do decide that passing in rarefied tables is a good idea, then the part of the README with the "table.qza view | download" stuff should be updated accordingly -- at minimum the links should be updated to point to the rarefied_table.qza file, but ideally the filename should be changed to rarefied_table.qza also. Also IMO it would be best to include both the rarefied and non-rarefied table in the repository, so that the non-rarefied table can be used for the stand-alone visualization.)
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
That makes sense, @fedarko and I just chatted about it. We'll document this clearly, and note that it's a user-configurable choice. |
(╯°□°)╯︵ ┻━┻ ┏━┓┏━┓┏━┓ cc @fedarko
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ElDeveloper! this looks really great -- I have some suggestions but the core of this is awesome.
From going over the README and the matching code, it looks like #244 will be important here. Would you like for me to try to get a fix for that in today?
Let me know if you'd like to chat about any of this.
empress/tools.py
Outdated
|
||
if ord_ids.issubset(table_ids): | ||
extra = table_ids - ord_ids | ||
if extra and not filter_extra_samples: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it'd be clearer to replace if extra and not filter_extra_samples:
with just an if extra:
check, and then below that if not filter_extra_samples:
and else:
branches that handle the two branches here. I think this would make program flow a bit clearer, since extra
has to be truthy in both cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call!
Co-authored-by: Marcus Fedarko <[email protected]>
Co-authored-by: Marcus Fedarko <[email protected]>
Not sure why this didn't commit before
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ElDeveloper! Looks good to me. If you wouldn't mind I think the README should be updated re: the new table.qza file, but aside from that this seems good to merge.
of the samples in the ordination (if the ordination was made using a *filtered | ||
table*). If you'd like to read more about this, there's some informal | ||
discussion in [pull request 237](https://github.com/biocore/empress/pull/237). | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
something i just thought of (sorry for not bringing this up earlier), but since the table.qza file is changed in this PR the README should be updated to match that, right? i.e. the "view"/download links should be altered at least (preferably with a sentence or 2 of explaining which table from the MP tutorial this is)
@@ -86,6 +89,7 @@ qiime empress plot \ | |||
--m-sample-metadata-file docs/moving-pictures/sample_metadata.tsv \ | |||
--m-feature-metadata-file docs/moving-pictures/taxonomy.qza \ | |||
--o-visualization docs/moving-pictures/empress-tree-tandem.qzv | |||
--p-filter-extra-samples |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GitHub isn't letting me suggest it on the correct line, but i think this is missing a backslash after the end of the above line
Thanks @fedarko! |
…e shearing (#247) * BUG: Apply empty feat/samp removal before shearing Closes #244. Just gotta add tests for this now... * TST: work on testing removal func #244 * TST: Fix tests broken due to #244 changes * TST: start on fixing core tests re: empty removal * BUG: fix arg order * MNT: use .drop() instead of .loc in a test quiets pandas about warning * BUG: check for empty samps/feats in ordination And handle this by raising an explanatory error msg. TODO, just gotta fix tests now??? * TST: add tests for ord. checking in empty removal one step closer to #244 >:) * TST: get back to fixing final core tests re pcoa hmmmmm * TST: Fix core tests: make Sample4 nonempty Also fixed the filter_unobserved_features test by updating the metadata accordingly * STY: appease flake8 had to add a noqa for a particularly long url comment also re-commented the writing-dictcode.py-out code from the core tests (my b, shouldn't have committed that) * TST: test empty-samp-in-pcoa case from core tsts Now that this all is mostly done, we can close #244 * MNT: don't remove empty features in ord matching instead, defer this to remove_empty_...(), later on. per #244. * DOC: update readme about #244/#237 complications * Update empress/compression_utils.py Co-authored-by: Yoshiki Vázquez Baeza <[email protected]> * Update tests/python/test_compression_utils.py Co-authored-by: Yoshiki Vázquez Baeza <[email protected]> * Update tests/python/test_compression_utils.py Co-authored-by: Yoshiki Vázquez Baeza <[email protected]> Co-authored-by: Yoshiki Vázquez Baeza <[email protected]>
This would otherwise lead to situations where mismatching categories
would be displayed in the ordination and phylogeny. In principle users
should use the rarefied table that was used for generating the
ordination i.e. samples should have a 1:1 match.