-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
This adds logic to pass through the file descriptors when using openmpi #95
Conversation
PMI2 needs to pass through an open file descriptor. There is a way to do this with podman but the file descriptors need to be consecutive. This fix will dup the file descriptor to fd 3 (the first one after the defaults of stdin, out, and err). It then set PMI_FD to point to the dupped fd (3).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, Shane. The overall setup seems ok. It's good to have something working in the short term even if it's a little hacky. I'll copy it over to muller to test.
I wonder if eventually we should fold this into some kind of --pmi
module, which could set shared-run
and the PMI_FD
variable internally. Users might eventually stack it with an --openmpi
module, although I admit I haven't thought this all the way through yet.
For now if we leave it as an environment variable, it would be good to document it on the README.
Tested on muller with an openmpi helper module. I'll open a separate MR for the helper module.
|
Found out this was wrong, see next comment. Update: this seems ok on 1 node but fails on 2 nodes. In my test it's because the number of the file descriptor differs.
|
False alarm, I hadn't updated the podman_hpc.py in the second node of my reservation. Sorry about that. |
stephey@nid001003:/mscratch/sd/s/stephey/openmpi> srun -n 2 --mpi=pmi2 podman-hpc run --rm --openmpi-pmi2 -v $(pwd):/work -w /work registry.nersc.gov/library/nersc/mpi4py:3.1.3-openmpi ./print.sh
|
PMI relies on passing through open file descriptors. Podman supports this but there is some extra steps needed to make it work.