Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scheduler system instead of mulitprocessing #2

Open
alexhbnr opened this issue Apr 1, 2020 · 0 comments
Open

Scheduler system instead of mulitprocessing #2

alexhbnr opened this issue Apr 1, 2020 · 0 comments
Labels
enhancement New feature or request

Comments

@alexhbnr
Copy link

alexhbnr commented Apr 1, 2020

Hi everyone,

First of all, thanks for keeping up the great work. I was wondering about the design philosophy that you chose for executing the PhyloPhlAn commands and whether using the Python module multiprocessing is the best way to go for very large (> 10,000 species) data sets?

While using multiprocessing's functions allows to distribute the individual task across multiple cores of the same computational node, it is restricted by the size of the current node. In the environments I am currently working at, we typically have a large number of nodes (> 50) that are relatively small (on average 36 cores) connected by a scheduling system. For my current system, PhyloPhlAn would substantially benefit if the individual tasks were submitted to different nodes, rather than run on a single node.

I think the main functionality of PhyloPhlan (function standard_phylogeny_reconstruction from phylophlan.py) could be easily replaced by a pipeline constructed for being run using Snakemake or Nextflow. Especially Snakemake should be an easy port because it uses Python for configuration. Using such a pipeline would then be suitable for both scenarios, either one big node with many cores or many small nodes with fewer cores, because one could decide whether to run the pipeline locally or using a scheduling system, e.g. SLURM.

I was wondering whether you already had thought about it and, if yes, what your design decision against it has been?

Thanks, Alex

@fasnicar fasnicar added the enhancement New feature or request label Apr 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants