Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Graph tool #83

Merged
merged 62 commits into from
Jul 16, 2020
Merged

Graph tool #83

merged 62 commits into from
Jul 16, 2020

Conversation

nickjcroucher
Copy link
Collaborator

Tested with mash on OSX and sketch-lib on unix, with strain/lineage definitions, working from references and addition of queries.

Overall comments:

  • isolate names: it would be good to always refer to names (from input file) and labels (for visualisation); names are now stored in the network; new utils function isolateNameToLabel for extracting labels
  • lineage definitions: now structured more like the strain definition methods; uses a network, names by frequency; nomenclature kept ~consistent by storing 'seed' isolates for each lineage, extracted as the most connected vertex in the component (to avoid more changes with those on the periphery of the component)
  • reference graph: confusing to have a refList (relative to queryList), and a list of extracted references ("representatives" perhaps?); might be sensible to have consistent variable names for these types of variables (rlist/refList/mashOrder are usually all the same thing, for example)
  • looking through parts of the code that have not been explored for a while - noticed there are some redundant functions for reference writing, not modified here (writeTmpFile, writeDummyReferences, writeReferences)

@nickjcroucher nickjcroucher requested a review from johnlees May 14, 2020 06:30
@nickjcroucher
Copy link
Collaborator Author

Tests are failing because, with mash, PopPUNK expects 378 comparisons, but Mash generates 784 comparisons - which is likely because the ref/query datasets are the same files named differently.

@nickjcroucher nickjcroucher linked an issue May 14, 2020 that may be closed by this pull request
@nickjcroucher
Copy link
Collaborator Author

Checks now successful. Networkx can be removed from environment.yml if PR merged.

nickjcroucher and others added 25 commits July 15, 2020 13:43
Add file name correction.

Co-authored-by: John Lees <[email protected]>
Remove commented line.

Co-authored-by: John Lees <[email protected]>
Reverting merge error.

Co-authored-by: John Lees <[email protected]>
Remove whitespace.

Co-authored-by: John Lees <[email protected]>
Remove commented line.

Co-authored-by: John Lees <[email protected]>
Fix file naming.

Co-authored-by: John Lees <[email protected]>
Remove whitespace.

Co-authored-by: John Lees <[email protected]>
FIx file naming.

Co-authored-by: John Lees <[email protected]>
Fix network file naming.

Co-authored-by: John Lees <[email protected]>
@nickjcroucher
Copy link
Collaborator Author

Thanks very much for the comments, they're great, I think that's all the fixes done. Let me know if there is anything else you've spotted. The installation with conda works fine, but there's an issue with using setup.py because graph-tool cannot be installed through Pypi (https://git.skewed.de/count0/graph-tool/-/issues/208).

Copy link
Member

@johnlees johnlees left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks for all the changes! One final comment to remove an unused function.

Once I've released the next (and hopefully final) version of sketchlib, integrated it here, and closed #73, #77 and #79 I'll make a new release where I drop PyPI completely

PopPUNK/utils.py Outdated
@@ -161,7 +161,7 @@ def iterDistRows(refSeqs, querySeqs, self=True):
for ref in refSeqs:
yield(ref, query)

def listDistInts(refSeqs, querySeqs, self=True):
def old_listDistInts(refSeqs, querySeqs, self=True):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you just remove this function

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, just hadn't updated a local change, updated in 17a3f0e.

@johnlees johnlees merged commit 08488d6 into master Jul 16, 2020
@johnlees johnlees deleted the graph-tool branch July 16, 2020 11:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Switch networkx for graph-tool
2 participants