-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inquiry on implementation of parallelism on the cluster #208
Comments
To make my question clearer, please look at this example code in the documentation:
This code does not specify any process to be master or slave. But when the parallel processes all reach the sentence |
When you start your process like:
More is explained here. Note that this |
Just to clarify it again, when you run your code this way in your laptop:
then Unfortunately, we are in 2019, and cluster/supercomputers still fail to provide support for dynamic process management, or they make it quite cumbersome to setup. So you cannot (or cannot easily) spawn new workers once the MPI execution started. To alleviate this mess,
A minor annoyance of this execution mode is that if your code creates many executors, all of them will share the same pool of |
I intend to use multiple cores on the cluster so I read the documentation https://adaptive.readthedocs.io/en/latest/tutorial/tutorial.parallelism.html#mpi4py-futures-mpipoolexecutor
What confuses me is the number of nodes to specify in
mpiexec -n
. In the documentation, it saysmpiexec -n 16 python -m mpi4py.futures run_learner.py
. I am just wondering that why it is 16 instead of 1. (As it says 1 in 'On your laptop/desktop you can run this script like:mpiexec -n 1 python run_learner.py
', is it just assuming laptop/desktop computer only has one core? ) By specifying 16, will this become an issue that multiple instances are created on different cores, so that the adaptive.Runner() on each core is actually being executed simultaneously?(if so, it doesn't seem to benefit from parallelism since every core is doing the redundant same thing.) But I thought the expected behavior would be: there are multiple processes but only one is the master process. The other process will take jobs from the process pool. So I guess the code will be like:So, how do I know in which mode the Runner behaves or whether it benefits from parallelism?
The text was updated successfully, but these errors were encountered: