Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Match feature table and ordination #237

Merged
merged 15 commits into from
Jul 10, 2020
Merged

Conversation

ElDeveloper
Copy link
Member

This would otherwise lead to situations where mismatching categories
would be displayed in the ordination and phylogeny. In principle users
should use the rarefied table that was used for generating the
ordination i.e. samples should have a 1:1 match.

This would otherwise lead to situations where mismatching categories
would be displayed in the ordination and phylogeny. In principle users
should use the rarefied table that was used for generating the
ordination i.e. samples should have a 1:1 match.

Fixes biocore#204
Copy link
Collaborator

@kwcantrell kwcantrell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ElDeveloper thanks, this looks good. @fedarko go ahead and merge this if you don't have any comments.

@fedarko
Copy link
Collaborator

fedarko commented Jul 8, 2020

@ElDeveloper The code changes look fine to me (with one conceptual question which I'll get to below). As I understand it, this PR replaces the table.qza in the repository with the rarefied table from the moving pictures tutorial?

I have some concerns about this. For one, I don't think the rarefied table should be used for the sample coloring functionality for Empress -- unless I'm missing something, there isn't a "need" to rarefy the data analogous to how rarefaction is needed for ordinations. Using the rarefied table for coloring the tree will likely result in a decently different coloring than a non-rarefied table, and IMO we should try to preserve all of the data as much as possible.

My contention is that it would be better to accept non-rarefied tables in this case (i.e. tables where the ordination might have a subset of the samples in the table, due to these samples being dropped out by rarefaction), and then alter the matching code to just subset the table's samples to those in the ordination (without altering the other samples' abundances). This should probably be accompanied by a warning about these samples being dropped due to not being in the ordination. (Of course, samples in the ordination but not the table should cause an error, as would ordinations where none of the samples are present in the table.)

However, if you disagree with this, please let me know -- we can chat. Regardless of what we decide on as the "best practice," I think we should document this somewhere so that users are well-informed. If there isn't a clear best practice, then we should document that.

(If we do decide that passing in rarefied tables is a good idea, then the part of the README with the "table.qza view | download" stuff should be updated accordingly -- at minimum the links should be updated to point to the rarefied_table.qza file, but ideally the filename should be changed to rarefied_table.qza also. Also IMO it would be best to include both the rarefied and non-rarefied table in the repository, so that the non-rarefied table can be used for the stand-alone visualization.)

@rob-knight
Copy link

rob-knight commented Jul 8, 2020 via email

@ElDeveloper
Copy link
Member Author

That makes sense, @fedarko and I just chatted about it. We'll document this clearly, and note that it's a user-configurable choice.

Copy link
Collaborator

@fedarko fedarko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ElDeveloper! this looks really great -- I have some suggestions but the core of this is awesome.

From going over the README and the matching code, it looks like #244 will be important here. Would you like for me to try to get a fix for that in today?

Let me know if you'd like to chat about any of this.

README.md Outdated Show resolved Hide resolved
empress/plugin_setup.py Outdated Show resolved Hide resolved
empress/core.py Outdated Show resolved Hide resolved
empress/tools.py Outdated Show resolved Hide resolved
empress/tools.py Outdated

if ord_ids.issubset(table_ids):
extra = table_ids - ord_ids
if extra and not filter_extra_samples:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it'd be clearer to replace if extra and not filter_extra_samples: with just an if extra: check, and then below that if not filter_extra_samples: and else: branches that handle the two branches here. I think this would make program flow a bit clearer, since extra has to be truthy in both cases?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call!

empress/tools.py Show resolved Hide resolved
empress/tools.py Outdated Show resolved Hide resolved
empress/tools.py Outdated Show resolved Hide resolved
empress/tools.py Show resolved Hide resolved
Copy link
Collaborator

@fedarko fedarko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ElDeveloper! Looks good to me. If you wouldn't mind I think the README should be updated re: the new table.qza file, but aside from that this seems good to merge.

of the samples in the ordination (if the ordination was made using a *filtered
table*). If you'd like to read more about this, there's some informal
discussion in [pull request 237](https://github.com/biocore/empress/pull/237).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

something i just thought of (sorry for not bringing this up earlier), but since the table.qza file is changed in this PR the README should be updated to match that, right? i.e. the "view"/download links should be altered at least (preferably with a sentence or 2 of explaining which table from the MP tutorial this is)

empress/tools.py Show resolved Hide resolved
empress/tools.py Show resolved Hide resolved
@@ -86,6 +89,7 @@ qiime empress plot \
--m-sample-metadata-file docs/moving-pictures/sample_metadata.tsv \
--m-feature-metadata-file docs/moving-pictures/taxonomy.qza \
--o-visualization docs/moving-pictures/empress-tree-tandem.qzv
--p-filter-extra-samples
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GitHub isn't letting me suggest it on the correct line, but i think this is missing a backslash after the end of the above line

@fedarko fedarko merged commit aa40f2b into biocore:master Jul 10, 2020
@ElDeveloper
Copy link
Member Author

Thanks @fedarko!

fedarko added a commit to fedarko/empress that referenced this pull request Jul 10, 2020
ElDeveloper added a commit that referenced this pull request Jul 10, 2020
…e shearing (#247)

* BUG: Apply empty feat/samp removal before shearing

Closes #244. Just gotta add tests for this now...

* TST: work on testing removal func #244

* TST: Fix tests broken due to #244 changes

* TST: start on fixing core tests re: empty removal

* BUG: fix arg order

* MNT: use .drop() instead of .loc in a test

quiets pandas about warning

* BUG: check for empty samps/feats in ordination

And handle this by raising an explanatory error msg.

TODO, just gotta fix tests now???

* TST: add tests for ord. checking in empty removal

one step closer to #244 >:)

* TST: get back to fixing final core tests re pcoa

hmmmmm

* TST: Fix core tests: make Sample4 nonempty

Also fixed the filter_unobserved_features test by updating the
metadata accordingly

* STY: appease flake8

had to add a noqa for a particularly long url comment

also re-commented the writing-dictcode.py-out code from the core
tests (my b, shouldn't have committed that)

* TST: test empty-samp-in-pcoa case from core tsts

Now that this all is mostly done, we can close #244

* MNT: don't remove empty features in ord matching

instead, defer this to remove_empty_...(), later on. per #244.

* DOC: update readme about #244/#237 complications

* Update empress/compression_utils.py

Co-authored-by: Yoshiki Vázquez Baeza <[email protected]>

* Update tests/python/test_compression_utils.py

Co-authored-by: Yoshiki Vázquez Baeza <[email protected]>

* Update tests/python/test_compression_utils.py

Co-authored-by: Yoshiki Vázquez Baeza <[email protected]>

Co-authored-by: Yoshiki Vázquez Baeza <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants