Skip to content

Job management with Slurm

Kjetil Klepper edited this page Nov 12, 2024 · 47 revisions

Most of the jobs on UseGalaxy.no will be managed by Slurm and run inside a Singularity container with the dependencies required by the tool. Our compute backend consist of 2 permanent compute nodes plus extra nodes that are dynamically added and removed by ECC on demand (at least two ECC nodes should always be active).

  • slurm.usegalaxy.no ➤ Our main compute node running on a VM on our physical hardware (20 cores, 200GB memory)
  • nrec2.usegalaxy.no ➤ A permanent compute node from NREC (32 cores, 128GB memory)
  • eccN.usegalaxy.no ➤ Extra dynamic nodes managed by ECC (32 cores, 128GB or 256GB memory)

In addition, some special tools will run outside of Slurm on the local node, which is the VM that also runs Galaxy itself (usegalaxy.no). The reason for this is that they depend on some unstated requirements which we have so far not been able to include in the default container but are available in the virtual environment that Galaxy runs in.

Useful Slurm commands

Here is a cheat sheet of the most commonly used Slurm commands.

sinfo

The sinfo command will display information about nodes and partitions, including the state of each node.

sysadmin@usegalaxy ~ $ sinfo

PARTITION             AVAIL  TIMELIMIT  NODES  STATE NODELIST
usegalaxy_production*    up   infinite      1  down* ecc3.usegalaxy.no
usegalaxy_production*    up   infinite      1    mix slurm.usegalaxy.no
usegalaxy_production*    up   infinite      3   idle ecc1.usegalaxy.no,ecc2.usegalaxy.no,nrec2.usegalaxy.no
sysadmin@usegalaxy ~ $ sinfo --long --Node
Fri Dec 10 15:02:10 2021
NODELIST            NODES             PARTITION       STATE CPUS    S:C:T MEMORY TMP_DISK WEIGHT AVAIL_FE REASON
ecc1.usegalaxy.no       1 usegalaxy_production*        idle 32     32:1:1 257785        0      1   (null) none
ecc2.usegalaxy.no       1 usegalaxy_production*        idle 32     32:1:1 257785        0      1   (null) none
ecc3.usegalaxy.no       1 usegalaxy_production*       down* 32     32:1:1 257785        0      1   (null) Not responding
nrec2.usegalaxy.no      1 usegalaxy_production*        idle 32     32:1:1 128769        0      1   (null) none
slurm.usegalaxy.no      1 usegalaxy_production*       mixed 20     20:1:1 201342        0      1   (null) none

Some common node states (exhaustive list)

  • idle : No jobs are currently running on the node
  • alloc : All the CPUs on the node are allocated to jobs and it cannot accept more until some have completed
  • mix : The node is currently running jobs but it still has capacity to run more
  • drain : The node will not accept new jobs but will complete the jobs currently running
  • down : The node is unavailable for use and no jobs can run on it

An asterisk * after the state means that the node is not responding.

scontrol

The scontrol [command] command is used to view or modify Slurm configuration, including job, job step, node, partition, reservation, and overall system configuration. This must be run as sudo.

If you need to take down a node, you can use the scontrol update command to set the state of the node to down or drain. The former will immediately take down the node and kill all currently running jobs, while the latter will allow running jobs to finnish but new jobs will not be assigned to this node.

sysadmin@usegalaxy ~ $ sudo scontrol update nodename=<node> state=drain reason='problems with this node'

If a node is currently in a drain or down state, you can bring it up again by setting the state to resume.

sysadmin@usegalaxy ~ $ sudo scontrol update nodename=<node> state=resume

If a job has been queued but is not starting because the required resources are not available (it is in pending state), you can force the job to start with a new set of requirements with the following command:

sysadmin@usegalaxy ~ $ sudo scontrol update job=<jobID> <LIST OF NEW SPECIFICATIONS>

For instance, to change the number of CPUs and memory requested by the job, run:

sysadmin@usegalaxy ~ $ sudo scontrol update job=<jobID> numcpus=<cores> minmemorynode=<MB>

This command can be especially useful on the test server, where each compute node only has 4 cores and 15 GB memory. The same "tool_destinations.yaml" configuration file is used on both test and production, and many of the tools are configured to require more resources than this and will thus fail to run on the test server. Specifying a lower number of CPUs and less memory will allow the job to be run on the test server anyway. You can see a full list of specifications that you can change here. To see a jobs current specifications, run scontrol show jobid <jobID>.

If a job fails on a node, it can be "held" by Slurm rather than being queued on a different node ("launch failed requeued held"). In this case, you can use the release command to start the job again. (Likewise, you can intentially hold a pending job to prevent it from running with the hold command)

sysadmin@usegalaxy ~ $ sudo scontrol release <jobID>

The requeue command will put a running, suspended or finished job back into a pending state, so it will execute again when resources become available. If some jobs are taking unexpectedly long to complete and you suspect it might be caused by issues with the compute node, you can drain the node and requeue the job to start it on another node instead. (Note that the job may not start right away even if resources are available, if the 'EligibleTime' of the job is in the future. You can check this attribute with scontrol show job <jobID>).

sysadmin@usegalaxy ~ $ sudo scontrol requeue <jobID>

If a user has submitted lots of big jobs that are hogging all the resources on the compute backend, you can use requeuehold to remove some of them from the cluster and place them back in a pending state. The jobs will not be started until you explicitly release them at a later time.

squeue

The squeue command shows information about jobs located in the Slurm scheduling queue, including currently running jobs and jobs still waiting to be run.

sysadmin@usegalaxy ~ $ squeue
   JOBID  PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
   143925 usegalaxy g155494_   galaxy PD       0:00      1 (Priority)
   143926 usegalaxy g155495_   galaxy PD       0:00      1 (Priority)
   143927 usegalaxy g155496_   galaxy PD       0:00      1 (Priority)
   143928 usegalaxy g155277_   galaxy PD       0:00      1 (Priority)
   143233 usegalaxy g154704_   galaxy  R    4:54:37      1 slurm.usegalaxy.no
   143268 usegalaxy g154737_   galaxy  R    4:39:10      1 ecc1.usegalaxy.no
   143561 usegalaxy g155184_   galaxy  R    3:16:55      1 ecc1.usegalaxy.no
   143759 usegalaxy g155376_   galaxy  R      33:54      1 slurm.usegalaxy.no
   143801 usegalaxy g155154_   galaxy  R       1:43      1 slurm.usegalaxy.no
   143906 usegalaxy g155476_   galaxy  R       2:24      1 slurm.usegalaxy.no

An "R" in the fifth column (STATE) means the job is currently running. If a job is still waiting for resources to become available, the state will be pending (PD). For the meaning of other states, see this list of job state codes.

scancel

Running jobs can be cancelled with the scancel <jobID> command. This must be run as sudo.

sysadmin@usegalaxy ~ $ squeue
   JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
  130807 usegalaxy g139876_   galaxy  R   23:49:11      1 slurm.usegalaxy.no

sysadmin@usegalaxy ~ $ scancel 130807
scancel: error: Kill job error on job id 130807: Access/permission denied

sysadmin@usegalaxy ~ $ sudo scancel 130807

sysadmin@usegalaxy ~ $ squeue
   JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)

sstat

The sstat command can be used to display information about currently running Slurm jobs. The jobID below can be a single slurm job ID or a comma-separated list of numbers.

sysadmin@usegalaxy ~ $  sudo sstat --allsteps -j <jobID>  | less -S

You can also get information about the processes (PID numbers) associated with each job step using the --pidformat option (or -i for short).

sysadmin@usegalaxy ~ $  sudo sstat -i --allsteps -j <jobID>

Use the --format option to just include a few selected fields in the output. Run sstat --helpformat to see which fields are available.

sysadmin@usegalaxy ~ $ sudo sstat --format=JobID,NTasks,AveCPU,AvePages,AveRSS,AveVMSize,MinCPUNode,MaxDiskRead,MaxDiskWrite,MinCPU --allstep -j <jobID> 

sacct

The sacct command can be used to display information (accounting data) about completed or running Slurm jobs.

The following command will list all Slurm jobs executed by Galaxy on the current day:

sysadmin@usegalaxy ~ $ sacct -u galaxy

To display jobs from other days, use the --starttime <time> (-S time) option to show jobs in any state after this time and/or --endtime <time> (-E time) to show jobs before this time. Note that the results may include jobs that were started before the specified start time if they ended before the specified end time, and also jobs ending after the specified end time if they were started before this time. If unspecified, the default start time will be at midnight (today) and the default end time will be "now".

sysadmin@usegalaxy ~ $ sacct -u galaxy -S 2022-03-10T00:00:00 -E 2022-03-10T23:59:59

You can use the --format option to specify what information to display. Run sacct --helpformat to see the names of all the fields you can add to the format list.

sysadmin@usegalaxy ~ $ sacct -u galaxy -X --format=JobID,JobName%30,Start,End,Elapsed,AllocCPUS,ReqMem,State,NodeList%20

       JobID                        JobName               Start                 End    Elapsed  AllocCPUS     ReqMem       State             NodeList
------------ ------------------------------ ------------------- ------------------- ---------- ----------  ---------- ---------- --------------------
143115              g154575_lastzsubwrapper 2022-03-16T20:16:40 2022-03-17T05:18:18   09:01:38          1        4Gn   COMPLETED   slurm.usegalaxy.no
139170                       g150413_fastqc 2022-03-16T21:15:20             Unknown   16:50:47          1       20Gn     RUNNING   slurm.usegalaxy.no
143210                        g154577_sort1 2022-03-17T05:18:21 2022-03-17T05:18:29   00:00:08          1        4Gn   COMPLETED   slurm.usegalaxy.no
143211                g154578_mergeoverlap2 2022-03-17T05:18:21 2022-03-17T05:18:29   00:00:08          1        4Gn   COMPLETED    ecc1.usegalaxy.no
143212                g154579_mergeoverlap2 2022-03-17T05:18:31 2022-03-17T05:18:38   00:00:07          1        4Gn   COMPLETED    ecc1.usegalaxy.no
143213         g154580_fasta_compute_length 2022-03-17T05:18:40 2022-03-17T05:18:47   00:00:07          1        4Gn   COMPLETED   slurm.usegalaxy.no
143214              g154581_lastz_wrapper_2 2022-03-17T05:18:43 2022-03-17T05:28:05   00:09:22          1        8Gn   COMPLETED   nrec2.usegalaxy.no
143218                         g154585_Cut1 2022-03-17T05:19:08 2022-03-17T05:19:15   00:00:07          1        4Gn   COMPLETED   slurm.usegalaxy.no
143219                   g154586_seq_rename 2022-03-17T05:19:17 2022-03-17T05:19:24   00:00:07          1        4Gn   COMPLETED   slurm.usegalaxy.no
143220              g154673_lastzsubwrapper 2022-03-17T06:30:01 2022-03-17T07:00:19   00:30:18          1        4Gn   COMPLETED   nrec2.usegalaxy.no
143221              g154674_lastz_wrapper_2 2022-03-17T06:30:01 2022-03-17T06:32:16   00:02:15          1        8Gn   COMPLETED   slurm.usegalaxy.no
143222              g154685_lastzsubwrapper 2022-03-17T06:37:14 2022-03-17T06:50:34   00:13:20          1        4Gn   COMPLETED   slurm.usegalaxy.no
143223              g154686_lastz_wrapper_2 2022-03-17T06:37:14 2022-03-17T06:37:59   00:00:45          1        8Gn   COMPLETED   slurm.usegalaxy.no
143833          g155410_ncbi_blastn_wrapper 2022-03-17T11:46:59 2022-03-17T11:47:18   00:00:19         10       40Gn   COMPLETED    ecc1.usegalaxy.no

The sacct command with the parameters above can also be run with the alias list_jobs on the main node.

Use the -j <jobID> option to see information about a specific job

sysadmin@usegalaxy ~ $ sacct -j <jobID>

Slurm services

The main node "usegalaxy.no" runs the Slurm controller daemon (slurmctld) and the Slurm database (slurmdbd). Each compute node runs a Slurm daemon (slurmd). You can check the status of these services and start/restart/stop them, if necessary, with the following commands:

sudo systemctl status <service>
sudo systemctl start <service>
sudo systemctl restart <service>
sudo systemctl stop <service>

Slurm configuration

The Slurm configuration files can be found in the directory /etc/slurm/:

  • /etc/slurm/slurm.conf
  • /etc/slurm/slurmdb.conf
  • /etc/slurm/cgroup.conf

Do not modify these files directly! They are created by our Ansible infrastructure playbook. (Note that the "slurmdb.conf" file is only on the main node)

Logs can be found in /var/log/slurm/.