Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Identify and document scalability benchmarks #74

Open
mortonjt opened this issue Oct 11, 2018 · 4 comments
Open

Identify and document scalability benchmarks #74

mortonjt opened this issue Oct 11, 2018 · 4 comments

Comments

@mortonjt
Copy link
Collaborator

Empress needs to be run against a huge tree (> 1 million tips)

@antgonza
Copy link
Collaborator

antgonza commented Feb 3, 2020

Just wondering if there are any updates on this issue; thank you.

@antgonza
Copy link
Collaborator

Installed the latest version of empress and ran it on one of the large trees generated in Qiita within a 2020.2 Qiime2 conda environment; the mapping file, feature-table and taxonomies from the moving pictures dataset - only one dataset.

Note that this is a tree was created over a year ago (we could generate even larger today), is the 100bp fragments insertion tree and is ~8.8M tips:

In [1]: from skbio import TreeNode
In [2]: tree = TreeNode.read('../insertion_tree.relabelled.tre')
In [3]: print(tree.count(tips=True))     
8830174

I generated the no-taxonomy, GG and Silva added empress qzv's to test, each takes ~3hrs to generate the qzv and it works just fine (no error messages). However, when I try to open them in https://view.qiime2.org/, the browser fails with:
qiime2-view-error
and if I unzip the qzv and try to open the index.html or empress.html, I get:
error-opening-directly

Anyway, here are the testing files.

cc: @ElDeveloper

@kwcantrell
Copy link
Collaborator

@antgonza I'm looking into this

@fedarko
Copy link
Collaborator

fedarko commented Aug 10, 2020

Once we identify upper bounds for what sorts of data sizes Empress can comfortably visualize, we should document this clearly in the README so that e.g. users with billion-tip trees know that they probably want to consult another tool and/or a priest ._.

@fedarko fedarko changed the title Scalability benchmarks Identify and document scalability benchmarks Aug 10, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants