queue_cc for multi-gpu setup #2

kmyi · 2018-12-07T05:30:36Z

Implement new version of queue_cc which runs multiple jobs in a node, dedicating each to a gpu. Requires on-the-fly creation of job scripts.

kmyi · 2018-12-07T17:32:48Z

Hey @weiweisun2018, not urgent, but if you can do this it would be great. Basically, the system currently prefers jobs that can take full nodes. Since our jobs can be batched together, it'd be a good idea to

grab N jobs (4 for cedar, 2 for graham)
create a new bash file for the job that has

#!/bin/bash
CUDA_VISIBLE_DEVICES=XXX jobscript1.sh &
CUDA_VISIBLE_DEVICES=YYY jobscript2.sh &
...
join

which would be a meta job that consumes a full node.
3. queue these jobs.

wsunid · 2018-12-07T20:25:29Z

Sure, my pleasure to do it. But there is a couple of question:
1, do you mean to submit a job in the array style? Could you please give me the specific document about the new way of arranging jobs?
2, From my understanding:
Take cedar as an example:

def check_ready_for_next_batch_job_and_return_batch_job(job_id):
       if there_is_no_batch_job_runing:
              return netxt_batch_job

batchjobs=[]

def scheduler_batchjobs():
       while true:
              for job_id in job_ids:
                    batchjobs.append(check_ready_for_next_batch_job_and_return_batch_job(job_id))
                    if len(batchjobs)=4:
                           notify_grabber()

def grab4batchjobs():
       while true:
               wait_for_scheduler()
                4batchjobs = batchjobs.pop(:4)
                

def submit_full_node_job()
        thread_scheduler = threading.thread(target=schedule_batchjobs)
        for 4batchjobs in  grab4batchjobs():
                send4batchjobs_to_cedar(4batchjobs) # My question here about how to submit such a full_node_job: should I request 4 GPUs and more memory?

kmyi · 2018-12-08T05:42:49Z

So for example, it'll be a single job containing multiple job executions inside. We want to do it this way so that we assign each job a specific GPU. array submission probably can't support that.

In short, we submit a fake job that uses all four GPUs (e.g. cedar), which internally is just running four jobs in parallel, one for each GPU.

kmyi assigned kmyi and wsunid Dec 7, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

queue_cc for multi-gpu setup #2

queue_cc for multi-gpu setup #2

kmyi commented Dec 7, 2018

kmyi commented Dec 7, 2018

wsunid commented Dec 7, 2018

kmyi commented Dec 8, 2018

queue_cc for multi-gpu setup #2

queue_cc for multi-gpu setup #2

Comments

kmyi commented Dec 7, 2018

kmyi commented Dec 7, 2018

wsunid commented Dec 7, 2018

kmyi commented Dec 8, 2018