The Language and Voice Laboratory (LVL) runs a tiny computing “cluster” called
Terra. This cluster consists of a few physical nodes, terra
, torpaq
and
gaia
.
Access is granted by request by a sysadmin in the LVL. Once you have a user account you can log into the main node:
Any questions additional questions can be asked on the #terra
channel on
Slack.
The LVL cluster uses Slurm to handle compute job scheduling and resource allocation. All resource intensive tasks must use the scheduling system, and please refrain from requesting way more resources than is necessary.
The command sbatch
is used to submit batch jobs to the scheduler. This is
the most common way to run tasks on the cluster. A batch job is described by a
batch script and the command-line arguments to sbatch
.
A batch script is a bash script with some special preprocessor directives, as seen in the example below.
#!/bin/bash
#SBATCH --gres=gpu:titanx:2
#SBATCH --mem=12G
#SBATCH --output=test-sbatch.log
echo "I have these GPUs:" $CUDA_VISIBLE_DEVICES
echo "On this machine" $(hostname)
exit 0
We send this job to the scheduler with
sbatch example-job.sbatch
This defines a job that will request two NVidia Titan X GPUs, 12 GB of memory
and write stdout/stderr to the file test-sbatch.log
in the current
directory. Once the scheduler is able to allocate the necessary resources it
will execute the job, writing the IDs of the allocated GPUs and the hostname
of the allocated node to test-sbatch.log
.
We can use sacct
to see the job history and squeue
to see queued and
running jobs.
There are a few file systems available on Terra. None of these are backed
up. All, except /scratch
, are raided for fault-tolerance.
Mount path | Purpose | Size | Speed | local node |
---|---|---|---|---|
/data | Shared datasets, models and archives. Read-only for users. | 2.7 TiB | Fast reads & slow writes | terra |
/scratch | “Unimportant” temporary files with many writes and reads. | 2 TiB | Fastest | terra |
/mnt/scratch | Links to /scratch for legacy reasons | |||
/work | More important temporary files | 3.4 TiB | Fastest reads & fast writes | torpaq |
/home | Code, configuration files, etc | 5.4T | Slow | terra |
Users have access to a few read-only folders on Terra. These places are meant to store frequently used corpora, models and tools.
Path | Purpose |
---|---|
/data | Datasets and data used by and created by LVL |
/models | Pretrained models from LVL or other sources |
/data/tools | Shared tools and libraries |
If you want to add your own or additional data, models or libraries contact the admins.
Singularity (FAQ) is a container solution for scientific computing that allows unprivileged use of containers. Singularity supports building its own images from scratch and ready-made Docker images.
A user can build their own containerized application/project on there own machines which can be run on Terra in a Slurm batch job.
Jupyter notebooks have become a popular way of doing scientific computing and interactive machine learning.
LVL runs a JupyterHub accessible at https://terra.hir.is (RU intranet, you’ll have to accept the self-signed cert) which allows users to spin up notebook servers through Slurm.
The notebook server runs in a container using an image with a Python 3.7 Conda base environment. The Conda tab allows you to create new environments, and new packages can be added to enviroments through the UI or in a notebook using a specific environment.
An easy way for a user to install necessary tools and libraries, other than compiling things yourself, is to use the Conda package manager.
To use it you first have to add it to your environment:
source /data/tools/anaconda/etc/profile.d/conda.sh
Then, to always have conda available you can add it to your bash profile with:
conda init
Let’s say that for some reason you need to use pdftotext
from
poppler-utils
, then you can create and environment specifically for that:
conda create -n pdf-stuff poppler-utils
This will create an environment named pdf-stuff
with the package
poppler-utils
and all of its dependencies installed. To activate it you run:
conda activate pdf-stuff
To verify that it has been loaded:
whereis pdftotext
pdftotext: /home/staff/rkjaran/.conda/envs/test-poppler-env/bin/pdftotext