Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Increase generalizability of software #12

Open
jolespin opened this issue Aug 15, 2024 · 2 comments
Open

[Feature Request] Increase generalizability of software #12

jolespin opened this issue Aug 15, 2024 · 2 comments
Assignees
Labels
enhancement New feature or request

Comments

@jolespin
Copy link

jolespin commented Aug 15, 2024

First off, incredible thanks for the developing this software suite! I've been struggling with calculating KEGG module completion ratios (MCR) at scale ever since KEGG removed their MAPLE software but still that wasn't able to run at scale as it was dependent on the web interface. I found a KEGG MCR calculation functionality in MicrobeAnnotator that I reimplemented in my VEBA (GitHub) (Pub) doi:10.1093/nar/gkae528 but the methodology from the original implementation is hard coded and doesn't capture alternative paths as well as it can. Separately I've been developing some "shortest path" based approaches for MCR calculations but would rather use your implementation since it's already further along.

For general usage, I'd like to recommend a few things:
0. Remove forced dependency versions

biopython==1.83
networkx==3.3
graphviz==0.20.3

This is very restrictive. Maybe you can do something like graphviz>=0.20.3 if this is the minimum version.

  1. Move all of the functions into a package with few dependencies. This will allow one to load in the functions to run the tool internally within a Python environment (e.g., if they want to include this as a dependency for another package as I plan to do) or run externally with a command line interface (current usage)
  2. For the "list" option, I would recommend using line breaks instead of commas. The reason for this is that most tools (e.g., grep, seqkit, skani) take in identifier lists with each item on a new line.
  3. Provide a batch option that allows [id_genome, id_ko] (or alternatively [id_contig, id_ko] if not genome-resolved)
  4. Make some of the packages optional with an error message telling you to install it if you're trying to run functionality that requires it and it's not installed (e.g., Biopython). The idea here is to keep the package as lightweight and flexible as possible.
  5. Output to a directory instead of using as a base name prefix
@KateSakharova KateSakharova self-assigned this Aug 20, 2024
@KateSakharova
Copy link
Contributor

Hi @jolespin, Thank you very much for your suggestion!
I will definetely improve tool in new release!
Best,
Kate

@KateSakharova KateSakharova added the enhancement New feature or request label Aug 20, 2024
@jolespin
Copy link
Author

jolespin commented Aug 23, 2024

Hi @KateSakharova just reaching out to let you know that I tried pushing some changes to your repo to address the items above but was having a lot of difficulty regarding the package structure/layout.

I need the pathway completion functionality for my VEBA software suite (https://github.com/jolespin/veba) to alleviate some bottlenecks in my workflow. Currently backlogged on some analysis and I realized that it would be faster for me to reimplement rather than push/pull changes to current repo. However, I would be more than happy to help integrate into your package if you were interested. In the meantime, I have you fully acknowledged across the top and bottom of repo so people know that the theory and base code are credited to you.

The reimplementation is below:
https://github.com/jolespin/kegg_pathway_profiler/

I designed the reimplementation so it can be used within Python and through the CLI

If you would like to me to make any adjustments (e.g., adding a preprint or any other citations) please let me know.

This package is a reimplementation of kegg-pathways-completeness-tool (e.g., base code and theory).
For any publications or usage, please cite the original implementation and credit the lead developer (See Acknowledgements below).

Acknowledgements:
Ekaterina Sakharova the developer for the original implementation kegg-pathways-completeness-tool.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants