Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when parsing Silva taxonomies #129

Closed
antgonza opened this issue Nov 22, 2019 · 5 comments · Fixed by #183
Closed

Error when parsing Silva taxonomies #129

antgonza opened this issue Nov 22, 2019 · 5 comments · Fixed by #183
Assignees
Labels
Milestone

Comments

@antgonza
Copy link
Collaborator

This is more an FYI as this was previously reported in QIIME2: https://forum.qiime2.org/t/qiime-taxa-filter-table-error/3947/10

If you try to use Silva based taxonomies you get this error:

There was an issue with viewing the artifact merged_seqs-taxonomy-silva.qza as QIIME 2 Metadata:

  CategoricalMetadataColumn does not support values with leading or trailing whitespace characters. Column 'Taxon' has the following value: 'D_0__Bacteria;D_1__Gemmatimonadetes;D_2__Gemmatimonadetes;D_3__Gemmatimonadales;D_4__Gemmatimonadaceae;D_5__Gemmatimonas;D_6__uncultured bacterium '
@fedarko
Copy link
Collaborator

fedarko commented Jun 15, 2020

@antgonza I'm working on testing that this error is resolved right now -- do you happen to still have the data that caused this lying around?

fedarko added a commit to fedarko/empress that referenced this issue Jun 16, 2020
I think this addresses biocore#129, at least to the extent that we can
handle this without messing with QIIME 2. But I'm not closing it
until I can verify (if poss.) that the original error is rectified.
fedarko added a commit to fedarko/empress that referenced this issue Jun 16, 2020
fedarko added a commit to fedarko/empress that referenced this issue Jun 16, 2020
fedarko added a commit to fedarko/empress that referenced this issue Jun 16, 2020
@fedarko
Copy link
Collaborator

fedarko commented Jun 16, 2020

Ok, updates on this: I can replicate the error @antgonza reported by, in QIIME 2 2019.7, running qiime metadata tabulate on the taxonomy file he was using (thanks for sharing it with me!). In QIIME 2 2019.10, the error is fixed, and the taxonomy file can be viewed as metadata without the error. I confirmed also that generating an Empress visualization from @antgonza's files works without an error in QIIME 2 2019.10.*

This is because the issue causing this problem was fixed in the 2019.10 version of QIIME 2, so I've just updated the README accordingly to request that users use at least that version. Just to be super extra careful, I've also added tests that verify that leading/trailing whitespace (including testing the exact D_5__Gemmatimonas annotation above) are handled properly.

So I think we can close this issue when the feature metadata branch (to which the above changes have been made) is merged in.

* (I had to generate the visualization on barnacle, since my computer killed the Empress python code... when I tried to view the resulting QZV on chrome, my browser wouldn't load it after ~5 minutes so I just gave up. However, the visualization was at least generated successfully. This is more motivation for improving the tool's performance as discussed in #74, #151, etc.)

@antgonza
Copy link
Collaborator Author

Thank you @fedarko!

BTW the shared files are from a meta-analysis from 3 host types (rats, mice, human) and only 5 qiita studies ... highlighting the importance of fixing/improving those other issues!

@antgonza
Copy link
Collaborator Author

Thinking about this, @fedarko, @ElDeveloper should those performance issues be part of the alpha release?

@ElDeveloper
Copy link
Member

@esayyari is working on a PR to make loading times faster. We should be able to hear more updates from him soon.

ElDeveloper added a commit that referenced this issue Jun 19, 2020
* ENH: Add initial stab at circular layout

Needs further inspection and testing, and of course the JS
needs to be modified to draw "vertical lines" as for the rect. layout
(but now with curves, etc.) but this seems like a decent start

* DOC: add details re circ layout alg and TODOs

* ENH: draw circ layout arcs (not beziers yet), etc

Lotta ugly hacky code in this commit. Will get things working
then from there backtrack to make things pretty and well tested before
the PR.

This is looking actually pretty nice. Will probably need to fix the
"shape" of the circular layout though LOL

* MNT: don't generate arc info for root (circlayout)

* BUG: still store pos data for root

* MNT: on 2nd thought draw "arc" for c.layout root

consistency will be useful, and we can adjust later as needed

also improved js docs for # lines needed

* DOC: document c0/c1 stuff for circ layout better

* BUG: Actually draw tips for circ layout

turns out that the reason things in this layout looked 'incomplete',
off, etc. was that xc0 and yc0 info was only being stored for internal
nodes, giving the dubious impression that penultimate nodes were tips.

With this, remaining circular layout TODOs are

1) drawing fancy curved arcs for internal nodes
2) adjust the UI to get the thick line stuff to work properly with
   the circular layout
3) clean up code, esp messy portions (like maybe store xc0/xc1 info
   in the layoutToCoordSuffix object somehow, by making it interpret
   an array of two suffixes as start and endpoints)
3) add tests

* MNT: Don't store root arc info; smooth arcs; docs

Improved documentation on *why* attempting to draw an arc
for the root is kinda useless. Hopefully shouldn't have to update
that again ._.

I split up the line segment for the circular layout into two, one
connecting an internal node's endpoint with its arc start and another
connecting the node endpoint with its arc endpoint. This makes the
layout look A LOT better; it's still essentially an approximation,
though. What we need to do is hijack the WebGL code to draw bezier
curves (starting and ending at the arc start/endpoints, and passing
through the internal node endpoint) instead. Uh, hopefully that isn't
too difficult? LOL I guess we'll see.

* DOC: improve circ layout coordinate docs

Addressing some ambiguities, and fixing an error w prev commit docs

* DOC: add note re: internal node arcs + cornercases

* BUG/MNT: Proper thick lines in c.layout; code chgs

I moved the general "corner-computing" code from within
Empress.thickenSameSampleLines() to its own function within VectorOps.

This allows a lot more code reuse, but I think we could make the code
even more compact by allowing _addTriangleCoords() to accept
{tL: ..., ...} objects directly.

* MNT: Make _addTriangleCoords accept corners obj

... Instead of accepting individual corner coordinates. This makes
life a bit easier, or at least makes the code a bit cleaner (since
now we can funnel the output from VectorOps.computeBoxCorners()
straight into Empress._addTriangleCoords()).

Of course it's worth noting that making the circular layout draw
the arcs with curved lines/beziers (and not just as the disjointed
line segments) will probably mean we'll have to abandon this
particular function for that particular use case. for now, tho, this
looks pretty nice

* for reference, update qzv

* ENH: Match up feature metadata w/ tree nodes

Progress on #130. Next up is testing this (... and un-breaking the
tests in general, my bad).

* TST: Un-break rect layout tests

The tests were broken due to me renaming lowestchildyr, etc. to
lowest_child_yr. Fixed.

* TST: Un-break matching tests due to #130 work

All that was really needed was just unpacking a third return
value (feature metadata). I added a few tests that this value was
None, but we'll test this in more depth shortly.

* STY: clean JS line-thickening code

Some repeated declarations ("var corners", "var x1", etc.) were
making jshint angry, so I addressed them (for corners, by declaring
it up front at the start of the function in question; for the x1/y1/
etc. stuff, by not even assigning them explicitly to variables and
just by passing them directly to the function I was calling).

* STY: Minor flake8 fixes

The build should be working now 💯

* TST: Add fm for testing; don't name internal nodes

the "don't bother naming internal nodes" change comes from the fact
that now that step is done after matching, anyway, so there's
no point in keeping this line in the tests

* TST: add basic feature metadata matching test #130

* TST: test feature metadata mismatch error :) #130

* TST: test partial feature dropping in fm matching

done with this part, gonna move on to taxonomy handling / etc. next

* ENH: Add taxonomy-splitting func signature/docs

(still need to accidentally implement it lol 💯)

* ENH: Draft taxonomy splitting code skeleton

the actual meat of this function isn't done yet, but this should at
least handle most (but not all, Error condition number 3 is still
unimplemented) of the funky corner cases.

* STY: flake8 line spacing

* ENH: validate ; ct. and do tax splitting #130

* TST: add basic tax splitting infrastructure tests

* MNT: tax splitting->new module; allow uneven ; cts

Lots of (untested) changes here. Need to test this stuff...

* BUG/TST: Fix "Level" col bug, and add tests for it

tldr code was lowercasing all col names and checking to see if
any started with "Level", which obvs wouldn't be the case since
"Level" would get turned to "level". Fixed problem.

* MNT: Put Level cols at start of fm

* TST: add partial tax splitting "good" case test

at least we know it works know lol

* TST: Finish "good" tax-splitting test

* TST: test funky tax splitting corner case

* TST/MNT: warn on no-; case and add test

* ENH: add UI skeleton for f. metadata coloring #130

* ENH/DOC: add fancy notes for coloring tabs

:D:D:D

* ENH: Chuck feat. md over to JS; populate selectors

So, now we have access to feature metadata in the Empress JS object,
and the selectors in the feature metadata coloring tab have the
f.m. fields populated there.

* MNT: Make feat. md. data transfer simpler

has advantage of preserving column order ;)

* DOC: Add inline docs to layout + anim. panels

* DOC: update QZV for reference

* ENH: Add initial, hacky f.m. coloring prototype!

:D:D:D:D

* STY: fix prettier issues

* BUG: Make code work, and fix UI, if no f.m. passed

UI could use some extra work -- maybe set the initial "hidden" on
the fm button directly from jinja2, if possible. but this at least
works?

* BUG: remove ref to missing UI ele in JS

this makes unchecking the feature metadata checkbox work properly
-- what was happening was the fHideChk thing was getting referenced,
which (silently from the user perspective ...) crashed the "close"
function, which prevented the tree from being reset and redrawn.
js, amirite

* STY: fix flake8 spacing pbm

* DOC: Describe use of f.m. in CLI docs

* ENH: color internal nodes with uniform child f.md

* Update qzv

* circular layout

* Add QZV including BOTH circ layout AND fm coloring

:D :D :D :D :D :D

* Show feature metadata for tips in select-node menu

Helps a lot for #130

* ENH: improve node hover menu UI

* Consistent table styling, better frozen header col

satisfied for now

* Show a table of feature metadata in the node hover menu; improve the existing styling in the menu (#166)

* Show feature metadata for tips in select-node menu

Helps a lot for #130

* ENH: improve node hover menu UI

* Consistent table styling, better frozen header col

satisfied for now

* STY: prettier fixes

* TST: add docstrings for f.m. matching tests

* ENH: fix minor grammar issue in node menu

* MNT: Clean code for name_internal_nodes()

- unlabled -> unlabeled
- use str.format() rather than % for formatting node names
  (seems to be preferred in modern python code)

* BUG: fix one last "unlabled"

* TST / BUG: Explicitly disallow nonunique tip names

I thought we were already checking for this, but turns out we
weren't.

* ENH/TST: warn when internal nodes share name

Per advice from @ElDeveloper

worth noting that the current check is bugged: it doesn't check
the root node's name. We should fix that.

* BUG/TST: Also consider root node name for warnings

* TST/BUG: raise error if tip name in int node names

Another annoying corner case it's good to detect and handle.

Note that we purposefully ignore "None"-named nodes, since those
will be replaced with unique IDs later.

* MNT: Rename name_internal_nodes() more accurately

Because:
  1) it wasn't just renaming internal nodes
  2) it also assigns missing branch lengths (although this is NOT
     tested as far as I can tell)

* MNT: fill_missing_node_data -> _node_names

SO: turns out that the validation code i added a while back re:
nodes with missing branch lengths was unnecessary, since
name_internal_nodes() already handled that.

Well, at this point the simplest thing to do is just remove that
functionality from name_internal_nodes() (now named
fill_missing_node_names()), since the validation should already
preclude missing branch lengths. (If this is annoying, we can
set things so that missing branch lengths are replaced with 0 or
1 or something, but I personally don't think that's a great idea.)

* STY: add extra blank line for flake8

* BUG: Handle EmpressNode* corner case

should be a temporary solution, but I don't anticipate this being
a problem for the majority of use cases.

* MNT: explicitly ignore None-named tips in matching

I don't believe this was doing anything before, but it could
have ostensibly been a problem if there was a feature named
None (like, the object None) in the table. (And I don't *think*
that will ever happen since we're getting these tables from
QIIME 2, which uses biom-format, which treats IDs as strings.)

* TST: test corner-case feature metadata matching

Some funky things re: 1) root of the tree being allowed to have fm
and 2) duplicate internal nodes being allowed to have fm

* STY: minor fixes to _plot.py

- Use a safer variable name than "file", which is a reserved keyword
  in some versions of python:
        https://stackoverflow.com/a/24942363/10730311
  (I doubt that this would cause problems since I'm pretty sure
  Empress only supports python 3, but we might as well be safe.)

- Use more reasonable-looking indentation for a jinja2 call

* fix typo in biom table js

* MNT: remove logging stmt I left in accidentally

* ENH: Add UI skeleton for two fm coloring options

* fix a grammar issue

* MNT: Refactor to store fmd as tip/int node md

This will make implementing the different JS "coloring methods" a lot
easier.

Need to update the tests.

* TST: Verify that taxonomy whtspcs handled right

I think this addresses #129, at least to the extent that we can
handle this without messing with QIIME 2. But I'm not closing it
until I can verify (if poss.) that the original error is rectified.

* TST: test specific SILVA annotation from #129

* fix typos

* TST: start fixing old match tests re: t/i fm split

* TST: fix more matching tests; imprv matching docs

... since i forgot about the 'moving tax splitting to match_inputs()'
change i made earlier today :P

* TST: Finish un-breaking old matching tests

Still need to, like, add on new tests that explicitly verify that
the splitting stuff works, but these changes essentially do that
already

* DOC: bump min q2 version to 2019.10

see #129

* TST: add clarifying comment re: #129 test

* Finish up merging stuff

* add back index.html gen stuff

i accidentally deleted during merging but we need this ._.

* temporarily rm circ layout

(it'll be added back when #180 is merged in)

* MNT: Abstract pretty sample table generation

... And use the new function for both leaf *and* internal node
selection menus.

The new function is a static method, so it should be pretty simple
to test (knock on wood). Need to actually test this though!!!!

This addresses one of the TODOs on #169.

* Remove #tree-container CSS block: close #168

This was getting on my nerves, so I just knocked it out now ._.

* DOC/MNT: clear tableEle HTML in s.tbl.gen; docs

* TST: Test tip/int node md cols, and details

Crucially, this includes the "just internal nodes" case :)

* TST: Un-break core tests

... By adding the tip/int metadata stuff to DICT_A.

We should definitely add tests that make sure that this info is
properly transmitted at some point, tho.

* TST: test feature metadata and _to_dict()

* BUG: be consistent w/ "EmpressNode" btwn python/js

We might just wanna not special-case this altogether tbh

* STY: fix js/py style issues - unbreak travis :D

* BUG: enable selecting the root node #169

* update qzv

* Fix typo

* ENH: Show int node fmd, simplify fm table code

Also fixed a few typos ("uniqe" -> "unique") and bolded the warning
about duplicate node IDs.

The code for this is p messy (although it does demonstrably work :D)
so I gotta fix this...

* MNT: Abstract fmd table gen code to sep func

... so we can use it for both internal nodes and tips.

TODO, need to add docs + tests for this ...

* DOC: Add docs for makeFeatureMetadataTable()

* DOC: make plugin setup fm usage desc more faithful

* BUG: Remove cursor: pointer on text/number inputs

I think this should make it clearer to the user that you can type
into these fields...

* ENH: Add int. node coloring method; refactoring

F. metadata coloring method is now configurable!!! only took me
forever ._.

* STY: js style fixes

* STY: fix variable re-defining pbm

* ENH: On changing fm method, change checkbox desc.

The wording could probably use some work, but this does the job.

* MNT: tidy up side panel code a bit

* BUG: Use linethickening for fm; readd desc updates

(The reason I held off on the line-thickening stuff is it's all
"thicken same sample lines" and stuff, but at this point we're
hijacking all of the sample coloring stuff to color by whatever.)

* DOC: update fm TODO comment

* MNT: Abstract out a LOT of side-panel-handler code

... into shared functions between the feature / sample coloring code.

This should have the side effect of making testing easier. I think?

In any case, the documentation seems a bit nicer :)

* DOC: update _updateColoring re funky method param

at least justifies why it's kinda wack

* BUG: prevent sample/ftr opts simultaneously shown

getting that commit title down to 50 characters was a struggle,
but you get the idea.

This involved adding back some functions to side panel handler, but
these are a lot more clear and concise so I don't mind them as much.
or maybe i'm just biased because i wrote them lol

* ENH: clean node selection menu; improve descs #169

Another thing ticked off for #169.

Something i thought of just now - would be good to say "Name" instead
of "ID" for nodes maybe? skbio treenode treats these as Names not IDs,
and according to wikipedia
(https://en.wikipedia.org/wiki/Newick_format) newick seems to mostly
consider these as "names" also.

* STY: fix JS style; redo qzv

* MNT: Rename _validate_data: makes behavior clearer

Addresses a comment from @kwcantrell

* DOC: Clean up _to_dict fm usage

Things should be a lot clearer now. Addresses points brought up
in @kwcantrell's review

* Clarify more comments in _to_dict

* BUG: Reset fm coloring "update" btn in selcallback

Addresses a comment from @ElDeveloper

* STY: Rename Empress ctx var name for closure

This is somewhat more consistent. Addresses an @ElDeveloper comment.

* STY: minor prettier thing

* MNT: Rename hover-* stuff to menu-*

Addresses comment from @kwcantrell

* Update empress/support_files/js/vector-ops.js

Co-authored-by: Yoshiki Vázquez Baeza <[email protected]>

* Only use {{ feature_metadata_columns }} once

addresses @ElDeveloper comment

* Add + update copyright headers to all python files

* Update empress/taxonomy_utils.py

Co-authored-by: Yoshiki Vázquez Baeza <[email protected]>

* Simplify max ; ct computation

Co-authored-by: Yoshiki Vázquez Baeza <[email protected]>

* Avoid unnecessary set() conversions

Addresses suggestion from @ElDeveloper

* Use @ElDeveloper suggestion for tax splitting code

* Update qzv

* Revert "Use @ElDeveloper suggestion for tax splitting code"

This reverts commit ce8e26c.

* ... update qzv from old logic

* unbreak circ layout js test

* Un-break python test

Thanks @esayyari for debugging this!

Co-authored-by: kcantrel <[email protected]>
Co-authored-by: Yoshiki Vázquez Baeza <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants