-
Notifications
You must be signed in to change notification settings - Fork 0
Job management with Slurm
Most of the jobs on UseGalaxy.no will be managed by Slurm and run inside a Singularity container with the dependencies required by the tool. Our compute backend consist of 2 permanent compute nodes plus extra nodes that are dynamically added and removed by ECC on demand (at least two ECC nodes should always be active).
- slurm.usegalaxy.no ➤ Our main compute node running on a VM on our physical hardware (20 cores, 200GB memory)
- nrec2.usegalaxy.no ➤ A permanent compute node from NREC (32 cores, 128GB memory)
- eccN.usegalaxy.no ➤ Extra dynamic nodes managed by ECC (32 cores, 128GB or 256GB memory)
In addition, some special tools will run outside of Slurm on the local node, which is the VM that also runs Galaxy itself (usegalaxy.no). The reason for this is that they depend on some unstated requirements which we have so far not been able to include in the default container but are available in the virtual environment that Galaxy runs in.
Here is a cheat sheet of the most commonly used Slurm commands.
The sinfo
command will display information about nodes and partitions, including the state of each node.
sysadmin@usegalaxy ~ $ sinfo
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
usegalaxy_production* up infinite 1 down* ecc3.usegalaxy.no
usegalaxy_production* up infinite 1 mix slurm.usegalaxy.no
usegalaxy_production* up infinite 3 idle ecc1.usegalaxy.no,ecc2.usegalaxy.no,nrec2.usegalaxy.no
sysadmin@usegalaxy ~ $ sinfo --long --Node
Fri Dec 10 15:02:10 2021
NODELIST NODES PARTITION STATE CPUS S:C:T MEMORY TMP_DISK WEIGHT AVAIL_FE REASON
ecc1.usegalaxy.no 1 usegalaxy_production* idle 32 32:1:1 257785 0 1 (null) none
ecc2.usegalaxy.no 1 usegalaxy_production* idle 32 32:1:1 257785 0 1 (null) none
ecc3.usegalaxy.no 1 usegalaxy_production* down* 32 32:1:1 257785 0 1 (null) Not responding
nrec2.usegalaxy.no 1 usegalaxy_production* idle 32 32:1:1 128769 0 1 (null) none
slurm.usegalaxy.no 1 usegalaxy_production* mixed 20 20:1:1 201342 0 1 (null) none
Some common node states (exhaustive list)
- idle : No jobs are currently running on the node
- alloc : All the CPUs on the node are allocated to jobs and it cannot accept more until some have completed
- mix : The node is currently running jobs but it still has capacity to run more
- drain : The node will not accept new jobs but will complete the jobs currently running
- down : The node is unavailable for use and no jobs can run on it
An asterisk *
after the state means that the node is not responding.
The scontrol [command]
command is used to view or modify Slurm configuration, including job, job step, node, partition, reservation, and overall system configuration. This must be run as sudo
.
If you need to take down a node, you can use the scontrol update
command to set the state of the node to down or drain. The former will immediately take down the node and kill all currently running jobs, while the latter will allow running jobs to finnish but new jobs will not be assigned to this node.
sysadmin@usegalaxy ~ $ sudo scontrol update nodename=<node> state=drain reason='problems with this node'
If a node is currently in a drain or down state, you can bring it up again by setting the state to resume.
sysadmin@usegalaxy ~ $ sudo scontrol update nodename=<node> state=resume
If a job has been queued but is not starting because the required resources are not available (it is in pending state), you can force the job to start with a new set of requirements with the following command:
sysadmin@usegalaxy ~ $ sudo scontrol update job=<jobID> <LIST OF NEW SPECIFICATIONS>
For instance, to change the number of CPUs and memory requested by the job, run:
sysadmin@usegalaxy ~ $ sudo scontrol update job=<jobID> numcpus=<cores> minmemorynode=<MB>
This command can be especially useful on the test server, where each compute node only has 4 cores and 15 GB memory. The same "tool_destinations.yaml" configuration file is used on both test and production, and many of the tools are configured to require more resources than this and will thus fail to run on the test server. Specifying a lower number of CPUs and less memory will allow the job to be run on the test server anyway. You can see a full list of specifications that you can change here. To see a jobs current specifications, run scontrol show jobid <jobID>
.
If a job fails on a node, it can be "held" by Slurm rather than being queued on a different node ("launch failed requeued held"). In this case, you can use the release
command to start the job again. (Likewise, you can intentially hold a pending job to prevent it from running with the hold
command)
sysadmin@usegalaxy ~ $ sudo scontrol release <jobID>
The requeue
command will put a running, suspended or finished job back into a pending state, so it will execute again when resources become available. If some jobs are taking unexpectedly long to complete and you suspect it might be caused by issues with the compute node, you can drain the node and requeue the job to start it on another node instead. (Note that the job may not start right away even if resources are available, if the 'EligibleTime' of the job is in the future. You can check this attribute with scontrol show job <jobID>
).
sysadmin@usegalaxy ~ $ sudo scontrol requeue <jobID>
If a user has submitted lots of big jobs that are hogging all the resources on the compute backend, you can use requeuehold
to remove some of them from the cluster and place them back in a pending state. The jobs will not be started until you explicitly release them at a later time.
The squeue
command shows information about jobs located in the Slurm scheduling queue, including currently running jobs and jobs still waiting to be run.
sysadmin@usegalaxy ~ $ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
143925 usegalaxy g155494_ galaxy PD 0:00 1 (Priority)
143926 usegalaxy g155495_ galaxy PD 0:00 1 (Priority)
143927 usegalaxy g155496_ galaxy PD 0:00 1 (Priority)
143928 usegalaxy g155277_ galaxy PD 0:00 1 (Priority)
143233 usegalaxy g154704_ galaxy R 4:54:37 1 slurm.usegalaxy.no
143268 usegalaxy g154737_ galaxy R 4:39:10 1 ecc1.usegalaxy.no
143561 usegalaxy g155184_ galaxy R 3:16:55 1 ecc1.usegalaxy.no
143759 usegalaxy g155376_ galaxy R 33:54 1 slurm.usegalaxy.no
143801 usegalaxy g155154_ galaxy R 1:43 1 slurm.usegalaxy.no
143906 usegalaxy g155476_ galaxy R 2:24 1 slurm.usegalaxy.no
An "R" in the fifth column (STATE) means the job is currently running. If a job is still waiting for resources to become available, the state will be pending (PD). For the meaning of other states, see this list of job state codes.
Running jobs can be cancelled with the scancel <jobID>
command. This must be run as sudo
.
sysadmin@usegalaxy ~ $ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
130807 usegalaxy g139876_ galaxy R 23:49:11 1 slurm.usegalaxy.no
sysadmin@usegalaxy ~ $ scancel 130807
scancel: error: Kill job error on job id 130807: Access/permission denied
sysadmin@usegalaxy ~ $ sudo scancel 130807
sysadmin@usegalaxy ~ $ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
The sstat
command can be used to display information about currently running Slurm jobs. The jobID below can be a single slurm job ID or a comma-separated list of numbers.
sysadmin@usegalaxy ~ $ sudo sstat --allsteps -j <jobID> | less -S
You can also get information about the processes (PID numbers) associated with each job step using the --pidformat
option (or -i
for short).
sysadmin@usegalaxy ~ $ sudo sstat -i --allsteps -j <jobID>
Use the --format option
to just include a few selected fields in the output. Run sstat --helpformat
to see which fields are available.
sysadmin@usegalaxy ~ $ sudo sstat --format=JobID,NTasks,AveCPU,AvePages,AveRSS,AveVMSize,MinCPUNode,MaxDiskRead,MaxDiskWrite,MinCPU --allstep -j <jobID>
The sacct
command can be used to display information (accounting data) about completed or running Slurm jobs.
The following command will list all Slurm jobs executed by Galaxy on the current day:
sysadmin@usegalaxy ~ $ sacct -u galaxy
To display jobs from other days, use the --starttime <time>
(-S time
) option to show jobs in any state after this time and/or --endtime <time>
(-E time
) to show jobs before this time. Note that the results may include jobs that were started before the specified start time if they ended before the specified end time, and also jobs ending after the specified end time if they were started before this time. If unspecified, the default start time will be at midnight (today) and the default end time will be "now".
sysadmin@usegalaxy ~ $ sacct -u galaxy -S 2022-03-10T00:00:00 -E 2022-03-10T23:59:59
You can use the --format
option to specify what information to display. Run sacct --helpformat
to see the names of all the fields you can add to the format list.
sysadmin@usegalaxy ~ $ sacct -u galaxy -X --format=JobID,JobName%30,Start,End,Elapsed,AllocCPUS,ReqMem,State,NodeList%20
JobID JobName Start End Elapsed AllocCPUS ReqMem State NodeList
------------ ------------------------------ ------------------- ------------------- ---------- ---------- ---------- ---------- --------------------
143115 g154575_lastzsubwrapper 2022-03-16T20:16:40 2022-03-17T05:18:18 09:01:38 1 4Gn COMPLETED slurm.usegalaxy.no
139170 g150413_fastqc 2022-03-16T21:15:20 Unknown 16:50:47 1 20Gn RUNNING slurm.usegalaxy.no
143210 g154577_sort1 2022-03-17T05:18:21 2022-03-17T05:18:29 00:00:08 1 4Gn COMPLETED slurm.usegalaxy.no
143211 g154578_mergeoverlap2 2022-03-17T05:18:21 2022-03-17T05:18:29 00:00:08 1 4Gn COMPLETED ecc1.usegalaxy.no
143212 g154579_mergeoverlap2 2022-03-17T05:18:31 2022-03-17T05:18:38 00:00:07 1 4Gn COMPLETED ecc1.usegalaxy.no
143213 g154580_fasta_compute_length 2022-03-17T05:18:40 2022-03-17T05:18:47 00:00:07 1 4Gn COMPLETED slurm.usegalaxy.no
143214 g154581_lastz_wrapper_2 2022-03-17T05:18:43 2022-03-17T05:28:05 00:09:22 1 8Gn COMPLETED nrec2.usegalaxy.no
143218 g154585_Cut1 2022-03-17T05:19:08 2022-03-17T05:19:15 00:00:07 1 4Gn COMPLETED slurm.usegalaxy.no
143219 g154586_seq_rename 2022-03-17T05:19:17 2022-03-17T05:19:24 00:00:07 1 4Gn COMPLETED slurm.usegalaxy.no
143220 g154673_lastzsubwrapper 2022-03-17T06:30:01 2022-03-17T07:00:19 00:30:18 1 4Gn COMPLETED nrec2.usegalaxy.no
143221 g154674_lastz_wrapper_2 2022-03-17T06:30:01 2022-03-17T06:32:16 00:02:15 1 8Gn COMPLETED slurm.usegalaxy.no
143222 g154685_lastzsubwrapper 2022-03-17T06:37:14 2022-03-17T06:50:34 00:13:20 1 4Gn COMPLETED slurm.usegalaxy.no
143223 g154686_lastz_wrapper_2 2022-03-17T06:37:14 2022-03-17T06:37:59 00:00:45 1 8Gn COMPLETED slurm.usegalaxy.no
143833 g155410_ncbi_blastn_wrapper 2022-03-17T11:46:59 2022-03-17T11:47:18 00:00:19 10 40Gn COMPLETED ecc1.usegalaxy.no
The sacct command with the parameters above can also be run with the alias list_jobs
on the main node.
Use the -j <jobID>
option to see information about a specific job
sysadmin@usegalaxy ~ $ sacct -j <jobID>
The main node "usegalaxy.no" runs the Slurm controller daemon (slurmctld) and the Slurm database (slurmdbd). Each compute node runs a Slurm daemon (slurmd). You can check the status of these services and start/restart/stop them, if necessary, with the following commands:
sudo systemctl status <service>
sudo systemctl start <service>
sudo systemctl restart <service>
sudo systemctl stop <service>
The Slurm configuration files can be found in the directory /etc/slurm/
:
- /etc/slurm/slurm.conf
- /etc/slurm/slurmdb.conf
- /etc/slurm/cgroup.conf
Do not modify these files directly! They are created by our Ansible infrastructure playbook. (Note that the "slurmdb.conf" file is only on the main node)
Logs can be found in /var/log/slurm/
.