Notes on using the Sheffield High Performance Computing facility. sharc
and iceberg
.
Improve these notes by editing them on github or by submitting an issue.
The official Research Computing Group documentation is http://docs.hpc.shef.ac.uk/en/latest/iceberg/. This document is intended to be more practical notes particularly guided towards our group's needs.
sharc
is the name of the new (as of 2017) HPC facility, it is generally preferred (faster, more reliable, and so on).
iceberg
is the name of the old facility; it might still be useful to use as it may have more packages installed.
To access sharc
(or iceberg
) you will need to SSH in.
From the command line:
ssh -X [email protected]
Don't use md1xdrj
, that's my username; you'll have to use yours.
Your username and password are the same one you use to access MUSE, unless you have changed it.
To use iceberg
, replace sharc.sheffield.ac.uk
with iceberg.sheffield.ac.uk
.
(You can make this as short as ssh sharc
, if you're okay with editing some ssh
config files, see my notes about making it more convenient)
Use qsub my-job-script
to submit a job.
There are many parameters (see man qsub
for the ultimate source of truth).
Iceberg's official documentation is here: http://docs.hpc.shef.ac.uk/en/latest/hpc/scheduler/sge.html
Parameters can be specified either on the qsub
command line,
or in the shell script itself on a line that begins #$
(see example below).
Parameters that you use repeatedly can be place in your ~/.sge_request
file.
I recommend putting at least the following for that file:
-M [email protected]
-m e
Other options are probably best set in your script itself.
On sharc
Hide Lab have access to the RSE queue (which is a pool of nodes also shared with other groups).
Put -P rse
in the job parameters.
You can either do this each time you submit a job:
qsub -P rse my-job-script
or you can put the parameters in the top section of the job script itself. To the top section of your job-script, add
#$ -P rse
(using -P
like this will cause the job to run on the rse queue if it is available,
otherwise it will run on the general queue)
On iceberg
Hide Lab used to rent a node which we could access using -P hidelab
, but we stopped renting the node in May 2018.
qstat
(on iceberg
:qstat -u $USER
) will tell you about the jobs that are either running or waiting to run.
If a job has already finished, it won't appear in this list.
This list also includes interactive shell sessions (such as started with qrshx
), they are jobs that are attached to an interactive terminal.
qtop
shows similar information in more detail and is quite fun (but it does not work on sharc
).
You can inspect a queue (rse.q
is the queue for the rse
project) like this:
On sharc
qstat -s r -q rse.q -u \*
In the above commands -s r
means we see only running jobs.
If you want to see other jobs (for example, you can see the priority of your own jobs that are waiting),
you can remove the -s r
part.
It's useful to know that state
tells you whether something is running, r
, or waiting, qw
;
and, slots
is how many CPU cores have been requested.
I have
/home/md1xdrj
10 gigabytes
/data/md1xdrj
100 gigabytes
You will have similar directories using your username.
We also have purchased additional storage that Hide Lab can use.
It's available on sharc
(and iceberg
) in
/shared/hidelab2
This is some sort of magic automount directory, so it doesn't appear in ls
and so on until you explcitly cd
to it or use it.
More information about the difference sorts of storage is available on the official HPC filestore page.
You can see if you're using lots of storage with:
df -h ~
df -h /data/md1xdrj
df -h /shared/hidelab2
or similar.