Match feature table and ordination #237

ElDeveloper · 2020-07-08T02:16:50Z

This would otherwise lead to situations where mismatching categories
would be displayed in the ordination and phylogeny. In principle users
should use the rarefied table that was used for generating the
ordination i.e. samples should have a 1:1 match.

This would otherwise lead to situations where mismatching categories would be displayed in the ordination and phylogeny. In principle users should use the rarefied table that was used for generating the ordination i.e. samples should have a 1:1 match. Fixes biocore#204

kwcantrell

@ElDeveloper thanks, this looks good. @fedarko go ahead and merge this if you don't have any comments.

fedarko · 2020-07-08T22:56:12Z

@ElDeveloper The code changes look fine to me (with one conceptual question which I'll get to below). As I understand it, this PR replaces the table.qza in the repository with the rarefied table from the moving pictures tutorial?

I have some concerns about this. For one, I don't think the rarefied table should be used for the sample coloring functionality for Empress -- unless I'm missing something, there isn't a "need" to rarefy the data analogous to how rarefaction is needed for ordinations. Using the rarefied table for coloring the tree will likely result in a decently different coloring than a non-rarefied table, and IMO we should try to preserve all of the data as much as possible.

My contention is that it would be better to accept non-rarefied tables in this case (i.e. tables where the ordination might have a subset of the samples in the table, due to these samples being dropped out by rarefaction), and then alter the matching code to just subset the table's samples to those in the ordination (without altering the other samples' abundances). This should probably be accompanied by a warning about these samples being dropped due to not being in the ordination. (Of course, samples in the ordination but not the table should cause an error, as would ordinations where none of the samples are present in the table.)

However, if you disagree with this, please let me know -- we can chat. Regardless of what we decide on as the "best practice," I think we should document this somewhere so that users are well-informed. If there isn't a clear best practice, then we should document that.

(If we do decide that passing in rarefied tables is a good idea, then the part of the README with the "table.qza view | download" stuff should be updated accordingly -- at minimum the links should be updated to point to the rarefied_table.qza file, but ideally the filename should be changed to rarefied_table.qza also. Also IMO it would be best to include both the rarefied and non-rarefied table in the repository, so that the non-rarefied table can be used for the stand-alone visualization.)

rob-knight · 2020-07-08T23:27:06Z

I can see arguments either way — rarefying will certainly affect the visualization, and in a stochastic way, but not rarefying will give too much visual weight to the samples with the largest # sequences, which may be undesirable depending on application. Can we make it a user-defined setting?

…

On Jul 8, 2020, at 3:56 PM, Marcus Fedarko ***@***.***> wrote: @ElDeveloper The code changes look fine to me (with one conceptual question which I'll get to below). As I understand it, this PR replaces the table.qza in the repository with the rarefied table from the moving pictures tutorial? I have some concerns about this. For one, I don't think the rarefied table should be used for the sample coloring functionality for Empress -- unless I'm missing something, there isn't a "need" to rarefy the data analogous to how rarefaction is needed for ordinations. Using the rarefied table for coloring the tree will likely result in a decently different coloring than a non-rarefied table, and IMO we should try to preserve all of the data as much as possible. My contention is that it would be better to accept non-rarefied tables in this case (i.e. tables where the ordination might have a subset of the samples in the table, due to these samples being dropped out by rarefaction), and then alter the matching code to just subset the table's samples to those in the ordination (without altering the other samples' abundances). This should probably be accompanied by a warning about these samples being dropped due to not being in the ordination. (Of course, samples in the ordination but not the table should cause an error, as would ordinations where none of the samples are present in the table.) However, if you disagree with this, please let me know -- we can chat. Regardless of what we decide on as the "best practice," I think we should document this somewhere so that users are well-informed. If there isn't a clear best practice, then we should document that. (If we do decide that passing in rarefied tables is a good idea, then the part of the README with the "table.qza view | download" stuff should be updated accordingly -- at minimum the links should be updated to point to the rarefied_table.qza file, but ideally the filename should be changed to rarefied_table.qza also. Also IMO it would be best to include both the rarefied and non-rarefied table in the repository, so that the non-rarefied table can be used for the stand-alone visualization.) — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

ElDeveloper · 2020-07-09T00:10:07Z

That makes sense, @fedarko and I just chatted about it. We'll document this clearly, and note that it's a user-configurable choice.

@fedarko

(╯°□°)╯︵ ┻━┻ ┏━┓┏━┓┏━┓ cc @fedarko

fedarko

Thanks @ElDeveloper! this looks really great -- I have some suggestions but the core of this is awesome.

From going over the README and the matching code, it looks like #244 will be important here. Would you like for me to try to get a fix for that in today?

Let me know if you'd like to chat about any of this.

README.md

empress/plugin_setup.py

empress/core.py

empress/tools.py

fedarko · 2020-07-09T21:54:29Z

empress/tools.py


+        if ord_ids.issubset(table_ids):
+            extra = table_ids - ord_ids
+            if extra and not filter_extra_samples:


I think it'd be clearer to replace if extra and not filter_extra_samples: with just an if extra: check, and then below that if not filter_extra_samples: and else: branches that handle the two branches here. I think this would make program flow a bit clearer, since extra has to be truthy in both cases?

empress/tools.py

Co-authored-by: Marcus Fedarko <[email protected]>

…-204

Co-authored-by: Marcus Fedarko <[email protected]>

Not sure why this didn't commit before

…-204

fedarko

Thanks @ElDeveloper! Looks good to me. If you wouldn't mind I think the README should be updated re: the new table.qza file, but aside from that this seems good to merge.

fedarko · 2020-07-09T23:40:18Z

README.md

+of the samples in the ordination (if the ordination was made using a *filtered
+table*). If you'd like to read more about this, there's some informal
+discussion in [pull request 237](https://github.com/biocore/empress/pull/237).
+


something i just thought of (sorry for not bringing this up earlier), but since the table.qza file is changed in this PR the README should be updated to match that, right? i.e. the "view"/download links should be altered at least (preferably with a sentence or 2 of explaining which table from the MP tutorial this is)

empress/tools.py

fedarko · 2020-07-10T00:06:00Z

README.md

@@ -86,6 +89,7 @@ qiime empress plot \
    --m-sample-metadata-file docs/moving-pictures/sample_metadata.tsv \
    --m-feature-metadata-file docs/moving-pictures/taxonomy.qza \
    --o-visualization docs/moving-pictures/empress-tree-tandem.qzv
+    --p-filter-extra-samples


GitHub isn't letting me suggest it on the correct line, but i think this is missing a backslash after the end of the above line

ElDeveloper · 2020-07-10T00:29:28Z

Thanks @fedarko!

…e shearing (#247) * BUG: Apply empty feat/samp removal before shearing Closes #244. Just gotta add tests for this now... * TST: work on testing removal func #244 * TST: Fix tests broken due to #244 changes * TST: start on fixing core tests re: empty removal * BUG: fix arg order * MNT: use .drop() instead of .loc in a test quiets pandas about warning * BUG: check for empty samps/feats in ordination And handle this by raising an explanatory error msg. TODO, just gotta fix tests now??? * TST: add tests for ord. checking in empty removal one step closer to #244 >:) * TST: get back to fixing final core tests re pcoa hmmmmm * TST: Fix core tests: make Sample4 nonempty Also fixed the filter_unobserved_features test by updating the metadata accordingly * STY: appease flake8 had to add a noqa for a particularly long url comment also re-commented the writing-dictcode.py-out code from the core tests (my b, shouldn't have committed that) * TST: test empty-samp-in-pcoa case from core tsts Now that this all is mostly done, we can close #244 * MNT: don't remove empty features in ord matching instead, defer this to remove_empty_...(), later on. per #244. * DOC: update readme about #244/#237 complications * Update empress/compression_utils.py Co-authored-by: Yoshiki Vázquez Baeza <[email protected]> * Update tests/python/test_compression_utils.py Co-authored-by: Yoshiki Vázquez Baeza <[email protected]> * Update tests/python/test_compression_utils.py Co-authored-by: Yoshiki Vázquez Baeza <[email protected]> Co-authored-by: Yoshiki Vázquez Baeza <[email protected]>

ElDeveloper added 2 commits July 7, 2020 19:12

Merge branch 'master' of github.com:biocore/empress into issue-204

6ab002e

kwcantrell reviewed Jul 8, 2020

View reviewed changes

Add vim files to gitignore and update table

ba68daf

ElDeveloper mentioned this pull request Jul 8, 2020

Reveal selected samples from a selected node in a tandem plot #243

Merged

ElDeveloper requested a review from fedarko July 8, 2020 21:26

fedarko mentioned this pull request Jul 9, 2020

Move empty sample / feature removal from the table to before tree shearing #244

Closed

ElDeveloper added 3 commits July 9, 2020 10:37

DOC: Add note about tables

3b86534

(╯°□°)╯︵ ┻━┻ ┏━┓┏━┓┏━┓ cc @fedarko

BLD: Fix plugin setup

478710e

TST: Add missing parameter

2ec2d6e

fedarko requested changes Jul 9, 2020

View reviewed changes

ElDeveloper and others added 7 commits July 9, 2020 16:24

Improve the flow of the code

ca353cb

Apply suggestions from code review

a6308ea

Co-authored-by: Marcus Fedarko <[email protected]>

Merge branch 'issue-204' of github.com:ElDeveloper/empress into issue…

2a7c492

…-204

Apply suggestions from code review

41fc8ae

Co-authored-by: Marcus Fedarko <[email protected]>

DOC: Add comment from @fedarko

75cfdda

Not sure why this didn't commit before

Merge branch 'issue-204' of github.com:ElDeveloper/empress into issue…

c7d1abc

…-204

DOC: Add note about pandas 🐼

96d7944

fedarko reviewed Jul 9, 2020

View reviewed changes

DOC: Document the table and flag

30e8514

fedarko reviewed Jul 10, 2020

View reviewed changes

Missing slash

733bf62

fedarko merged commit aa40f2b into biocore:master Jul 10, 2020

fedarko added a commit to fedarko/empress that referenced this pull request Jul 10, 2020

DOC: update readme about biocore#244/biocore#237 complications

0a46de4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Match feature table and ordination #237

Match feature table and ordination #237

ElDeveloper commented Jul 8, 2020

kwcantrell left a comment

fedarko commented Jul 8, 2020

rob-knight commented Jul 8, 2020 via email

ElDeveloper commented Jul 9, 2020

fedarko left a comment

fedarko Jul 9, 2020

ElDeveloper Jul 9, 2020

fedarko left a comment

fedarko Jul 9, 2020

fedarko Jul 10, 2020

ElDeveloper commented Jul 10, 2020

Match feature table and ordination #237

Match feature table and ordination #237

Conversation

ElDeveloper commented Jul 8, 2020

kwcantrell left a comment

Choose a reason for hiding this comment

fedarko commented Jul 8, 2020

rob-knight commented Jul 8, 2020 via email

ElDeveloper commented Jul 9, 2020

fedarko left a comment

Choose a reason for hiding this comment

fedarko Jul 9, 2020

Choose a reason for hiding this comment

ElDeveloper Jul 9, 2020

Choose a reason for hiding this comment

fedarko left a comment

Choose a reason for hiding this comment

fedarko Jul 9, 2020

Choose a reason for hiding this comment

fedarko Jul 10, 2020

Choose a reason for hiding this comment

ElDeveloper commented Jul 10, 2020