You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a workflow that consists of a single stage with a single task. The task is an MPI job, which use multiple processes and is supposed to run on multiple nodes. I find that actually it only runs on a single node. It is on Polaris machine, the resource manager system is PBS, and task launching is handled by mpiexec. Below is my entk script:
from radical import entk
import os
import argparse, sys, math
class MVP(object):
def __init__(self):
self.am = entk.AppManager()
def set_resource(self, res_desc):
self.am.resource_desc = res_desc
def generate_task(self):
t = entk.Task()
t.pre_exec = []
t.executable = '/bin/echo'
t.arguments = ["mytest"]
t.post_exec = []
t.cpu_reqs = {
'cpu_processes' : 8,
'cpu_process_type' : 'MPI',
'cpu_threads' : 16,
'cpu_thread_type' : 'OpenMP'
}
return t
def generate_pipeline(self):
p = entk.Pipeline()
s = entk.Stage()
t = self.generate_task()
s.add_tasks(t)
p.add_stages(s)
return p
def run_workflow(self):
p = self.generate_pipeline()
self.am.workflow = [p]
self.am.run()
if __name__ == '__main__':
mvp = MVP()
n_nodes = 2
mvp.set_resource(res_desc = {
'resource' : 'anl.polaris',
'queue' : 'debug',
'walltime' : 60,
'cpus' : 64 * n_nodes,
'gpus' : 4 * n_nodes,
'project' : 'CSC249ADCD08'
})
mvp.run_workflow()
Here my MPI job is basically a echo command. I launched 8 processes for that, each with 16 cores, and since Polaris has 64 cores (32 cores but 64 hardware threads, and in resource_anl.json, cpu_per_node is set to be 64) per node, I ask for two Polaris nodes. This is supposed to generate an output file of 8 lines of "mytest". However, I only see 4 lines of "mytest" (see sandbox below, task.0000.out). The script is executed without any error message.
My understanding is that radical is not generating an mpiexec command correctly. If you look at task.0000.launch.sh in sandbox, the mpiexec command it generates is:
However, I think this command can not submit a job to two nodes (x3006c0s1b0n0 and x3006c0s1b1n0), and my guess is that only the first -host flag is recognized. I did a small test using interactive nodes. I first ask for two interactive nodes on polaris, then I run two command as below:
a). /opt/cray/pe/pals/1.1.7/bin/mpiexec -host x3004c0s25b1n0,x3004c0s25b1n0,x3004c0s25b1n0,x3004c0s25b1n0 -n 4 -host x3004c0s31b0n0,x3004c0s31b0n0,x3004c0s31b0n0,x3004c0s31b0n0 -n 4 echo "mytest"
(Here the two hostname are obtained from $PBS_NODEFILE. This is trying to mimic what rct is doing). This outputs only four lines of "mytest"
b). mpiexec -n 8 --ppn 4 echo "mytest"
This outputs eight lines of "mytest", which is consistent with what we want.
Because of that, I think there is an issue with the mpiexec command rct generated on Polaris. Could you take a look at that? Thanks!
PS. It seems like github does not allow tar file, so I wrap it with zip. mpi_issue.zip
The text was updated successfully, but these errors were encountered:
On a first glance this seems like an incompatibility between mpiexec implementations. We should be able to switch that to a different parameter mode. @mtitov : should we add a LM config option to always use hostfiles? That's nice for debugging anyway and would resolve problems like this (hostfiles are part of the MPI spec and should be universally support IIRC).
@andre-merzky agree, using hostfile seems a safe approach, but when I did a quick check I found several names for that: -f or [most common] -hostfile. Would need to dig into this more
I have a workflow that consists of a single stage with a single task. The task is an MPI job, which use multiple processes and is supposed to run on multiple nodes. I find that actually it only runs on a single node. It is on Polaris machine, the resource manager system is PBS, and task launching is handled by mpiexec. Below is my entk script:
Here my MPI job is basically a echo command. I launched 8 processes for that, each with 16 cores, and since Polaris has 64 cores (32 cores but 64 hardware threads, and in resource_anl.json, cpu_per_node is set to be 64) per node, I ask for two Polaris nodes. This is supposed to generate an output file of 8 lines of "mytest". However, I only see 4 lines of "mytest" (see sandbox below, task.0000.out). The script is executed without any error message.
My understanding is that radical is not generating an mpiexec command correctly. If you look at task.0000.launch.sh in sandbox, the mpiexec command it generates is:
/opt/cray/pe/pals/1.1.7/bin/mpiexec -host x3006c0s1b0n0,x3006c0s1b0n0,x3006c0s1b0n0,x3006c0s1b0n0 -n 4 -host x3006c0s1b1n0,x3006c0s1b1n0,x3006c0s1b1n0,x3006c0s1b1n0 -n 4 $RP_TASK_SANDBOX/task.0000.exec.sh
However, I think this command can not submit a job to two nodes (x3006c0s1b0n0 and x3006c0s1b1n0), and my guess is that only the first -host flag is recognized. I did a small test using interactive nodes. I first ask for two interactive nodes on polaris, then I run two command as below:
a). /opt/cray/pe/pals/1.1.7/bin/mpiexec -host x3004c0s25b1n0,x3004c0s25b1n0,x3004c0s25b1n0,x3004c0s25b1n0 -n 4 -host x3004c0s31b0n0,x3004c0s31b0n0,x3004c0s31b0n0,x3004c0s31b0n0 -n 4 echo "mytest"
(Here the two hostname are obtained from $PBS_NODEFILE. This is trying to mimic what rct is doing). This outputs only four lines of "mytest"
b). mpiexec -n 8 --ppn 4 echo "mytest"
This outputs eight lines of "mytest", which is consistent with what we want.
Because of that, I think there is an issue with the mpiexec command rct generated on Polaris. Could you take a look at that? Thanks!
PS. It seems like github does not allow tar file, so I wrap it with zip.
mpi_issue.zip
The text was updated successfully, but these errors were encountered: