Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Empress.Tree subclass bp.Tree rather than skbio.TreeNode #155

Closed
fedarko opened this issue Apr 8, 2020 · 2 comments
Closed

Make Empress.Tree subclass bp.Tree rather than skbio.TreeNode #155

fedarko opened this issue Apr 8, 2020 · 2 comments

Comments

@fedarko
Copy link
Collaborator

fedarko commented Apr 8, 2020

@ElDeveloper and @antgonza have both brought up (1, 2) that the current workflow for creating an Empress Tree in empress/_plot.py is a bit inefficient, since the input newick file goes through the following steps:

  1. Read using bp.parse_newick()
  2. Converted to a scikit-bio TreeNode using bp.to_skbio_treenode()
  3. Converted to an Empress Tree using empress.tree.Tree.from_tree()

If we could cut out the scikit-bio step and just rely on the bp implementation, this would likely speed things up.

That being said, IMO the more worrying bottlenecks are on the JavaScript side of things -- optimizing that side will likely be more of a challenge than optimizing the Python side of things, so I'd argue that that is where we should focus efforts on optimization for the time being. (But obviously making either side faster is always nicer :)

Related issues: #151

(I haven't mentioned it here but there are similar concerns with how sample metadata is treated, since it goes from qiime2.Metadata to pd.DataFrame to dict.)

fedarko added a commit to fedarko/empress that referenced this issue Apr 17, 2020
ElDeveloper added a commit that referenced this issue Apr 20, 2020
… ok (#154)

* BUG/TST: Add back in data matching/checking code

Closes #139, for real this time.

Eventually we'll need to check that feature metadata matches up, but
that is its own problem for later down the road.

* STY: fix flake8 complaint

* DOC: add all needed moving pix files & "make docs"

Not sure why these files weren't here before, but this will make
rerunning the tutorial easy.

Also "make docs" is just a shorthand that saves extra typing when
re-visualizing the moving pictures tree. We could integrate this into
the travis build in the future if desired (of course this would be
predicated on us getting QIIME 2 set up in the travis build, which
would add on a few minutes to each build due to Q2 installation taking
some time).

* DOC: typo fix [ci skip]

* BUG: Transpose feature tbl before matching it

So apparently QIIME 2's transformers from biom table -> pd DataFrame
produce DFs that are transposed from what biom.Table does -- QIIME 2
uses samples as the indices (rows) and features as the columns,
while biom.Table does it the other way around.

As you can imagine, this is pretty confusing! This commit should fix
this problem from our end, but in the future we should really add
logic to prevent having to do table-DF-transposition, since IIRC that
can be super slow with massive DFs.

(...We really oughta unit-test _plot.)

* STY: rm extra blank line

* TST: rename a prev matching test and add skeletons

* TST: Add "no features shared" test for matching

part of #139 fixes

* TST: test a warning msg printed during matching

* TST: Add sample dropping warning test

think this pr should be good for now

* DOC: add note to match_inputs() re #130 (TODO)

* TST: Install and use QIIME 2 env in travis build

* TST: Add actual Q2 integration test!

Addresses @ElDeveloper's comment on #154.

I'm keeping 'make docs' around since it could still be nifty (if you
just wanna regenerate the empress-tree.qzv file without rerunning
the tests, I guess).

* TST: don't run 'make docs' on travis build

Since the Q2 Artifact API test I just added does the same thing.

* TST: Add rough Q2 visualization check #154

Addresses comment from @ElDeveloper

* STY: Remove blank lines in match_inputs()

Co-Authored-By: Yoshiki Vázquez Baeza <[email protected]>

* STY: more blank line removals in docstring

Co-Authored-By: Yoshiki Vázquez Baeza <[email protected]>

* STY: rm blank lines in print_if_dropped docstring

Co-Authored-By: Yoshiki Vázquez Baeza <[email protected]>

* MNT: warn instead of printing re: sample dropping

Tests haven't been updated yet -- will do so when
--ignore-missing-samples option added in. (So this will currently
break the tests.)

This represents part of the work on addressing @ElDeveloper's comments
on #154.

* ENH: add UI skeleton for no-data sample/feat flags

Per suggestion from @ElDeveloper in #154

* STY: make _plot inputs prettier

* DOC: add ref to emperor --ignore-missing-samples

* DOC: Remove 'standalone' instructions in README

Just for now. When we resolve #140, we should add these instructions
back in (likely we'll also have to adjust these when we get to the
'initial release' of Empress on PyPI / conda-forge / etc.)

* DOC: switch feature/sample flag order, imprv docs

* ENH: Add @ElDeveloper's suggested filtering flags

This entailed substantial restructuring of match_inputs(). I also
completely deleted warn_if_dropped(), because it was honestly easier
to replace it with custom error messages for each of its 3 usages.
(Also, that thing was like 50 lines of docstring / infrastructure for
8 lines of code. It was gnarly. :P)

This isn't done yet! I still need to test this new behavior
thoroughly, and to update the tests for the old functionality
accordingly.

* MNT: Avoid redundant table DF transpositions #155

* BUG: don't display useless warning in most cases

* TST: reduce tests to just one working one

will add more back (with relevant changes to work with new behavior)
soon

* DOC: add TODO note re empty checking

* TST: add back "simple" matching error tests

* TST: add + beef up tests of matching warnings, etc

* TST: add --p-ignore-missing-samples tests

* TST: add another cornercase test

* TST: test final "warning" in matching func for now

also fixed a bug in prev test i just added in, and removed extraneous
comment

* TST: Add other check for extra s.m. sample warning

I think I'm satisfied with the new matching behavior tests, at
least for now

* DOC: update example QZV :)

* MNT: don't warn on dropped samples from s.metadata

See new comment for justification. Addresses comment from
@ElDeveloper.

Co-authored-by: Yoshiki Vázquez Baeza <[email protected]>
@ElDeveloper ElDeveloper added this to the Second Alpha Release milestone Jun 22, 2020
@ElDeveloper
Copy link
Member

Should be addressed by #213

@fedarko
Copy link
Collaborator Author

fedarko commented Aug 11, 2020

I think there'll still be some extra work we could do on this issue post #213 -- the skbio.TreeNode representation of the tree is still used for a few things in the python code, and ripping these out / replacing them with uses of the bp tree would probably save us some time. (Might be worth making a new issue, though...)

@fedarko fedarko changed the title Using more straightforward types for trees/metadata in python Make Empress.Tree subclass bp.Tree rather than skbio.TreeNode Aug 27, 2020
ElDeveloper added a commit to ElDeveloper/empress that referenced this issue Aug 29, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants