-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Order of magnitude faster HWCD implementation for trees #217
base: master
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
|
This is fantastic! I'm very excited about the gain in speed. I suspect that a similar gain could also be achieved for non-tree networks. 3 things:
|
Josh is working on moving things over to
Yes! I think these could be adapted for networks very easily, I just personally needed a faster implementation specifically for trees.
This looks interesting! Does a project/codebase already exist for this? There does not seem to be any software implementation referenced in the paper. |
No, but it could be very useful to many. Hence my question! |
Two functions are added here:
hardwiredClusterDistance_treelike
andhardwiredClusters_treelike
. The runtimes of these functions grow linearly with respect to the number of edges in the input tree whereas the current implementation appears to grow exponentially.This implementation is most useful when computing unrooted HWCD for large trees, because so many rooting combinations need to be checked.
Below are some figures showing that this implementation is marginally slower than the current implementation with fewer than ~75 taxa, but scales significantly better.
Runtime:
At
n=1000
, this new implementation is ~152x faster (27ms vs. 4.1s).Memory Usage:
At
n=1000
, this new implementation uses ~1122x less memory (10.4 MiB vs. 11.4 GiB).Below are benchmarks where
tre
is a 1,000 taxa tree that can be found here.Old implementation:
New implementation:
As written, these functions only work with trees, but they could be adapted to work with networks relatively easily.
Important note:
hardwiredClusters_treelike
andhardwiredClusterDistance_treelike
were written for exclusively internal use, so (1) their docstrings are not very verbose, and (2) the return type ofhardwiredClusters_treelike
does not match the return type ofhardwiredClusters
. Both of these points could be changed with relatively little work if these functions are desired over the current implementations and are adapted to work on networks as well.