Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch networkx for graph-tool #82

Closed
johnlees opened this issue May 7, 2020 · 5 comments · Fixed by #83
Closed

Switch networkx for graph-tool #82

johnlees opened this issue May 7, 2020 · 5 comments · Fixed by #83
Labels
code Changes to the coding implementation enhancement New feature or request

Comments

@johnlees
Copy link
Member

johnlees commented May 7, 2020

See https://graph-tool.skewed.de/static/doc/index.html

@johnlees johnlees added enhancement New feature or request code Changes to the coding implementation labels May 7, 2020
@nickjcroucher
Copy link
Collaborator

(Very limited) start here: 7260e50. Slightly fiddly import code in network.py.

@nickjcroucher
Copy link
Collaborator

Working for main strain assigning functions in 5623a67.

Networkx version run on SPARC: https://microreact.org/project/9KY6djAvD (easy run took 1m56.113s on 24 threads)
Graph-tools version run on SPARC: https://microreact.org/project/yHmmBSjw3 (easy run took 1m54.125s on 24 threads)

Running detectably faster on only a few hundred genomes.

Only remaining networkx dependence is in the lineage_assignment.

@nickjcroucher
Copy link
Collaborator

Note that networks now store vertex labels as integers, not strings. Fiddly to remove and extend the network. Will need to check the behaviour with queries being added.

@johnlees
Copy link
Member Author

johnlees commented May 9, 2020

Great!
I think the integer indices may actually be a good thing, as I was beginning to think that this would be better linking with some of the other matrix code. We might need to change the code round a little, but if we keep a list (mapping int -> label in order) together with graph, which we map between at the start and end, it should be ok?

@nickjcroucher
Copy link
Collaborator

Yep, I think it is probably better, I just can't quite get my head around the way in which nodes are removed/added - e.g. after removal, the remaining are still consecutively labelled, so there needs to be some way of keeping track of what's what. It is possible to associate data with a vertex, but not sure how links are retained in e.g. shrinkage processes.

Another nice feature is that networks can be masked, e.g. for the visualisation code, rather than copied and pruned.

This version seems much faster with the refinement steps for large networks, I think it is definitely a worthwhile change.

@nickjcroucher nickjcroucher linked a pull request May 14, 2020 that will close this issue
ccoulombe pushed a commit to ccoulombe/PopPUNK that referenced this issue Oct 28, 2022
Lineage model fitting - sketchlib changes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
code Changes to the coding implementation enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants