Skip to content

Latest commit

 

History

History
442 lines (297 loc) · 19.2 KB

README.md

File metadata and controls

442 lines (297 loc) · 19.2 KB

Slurm-Mail

GitHub license GitHub stars GitHub forks GitHub issues GitHub Workflow Status Coverage badge GitHub Workflow Status GitHub Workflow Status

Author: Neil Munday (neil at mundayweb.com)

Repository: https://github.com/neilmunday/slurm-mail

Contents

  1. Introduction
  2. Requirements
  3. Installation
  4. Configuration
  5. SMTP Settings
  6. Customising E-mails
  7. Validating E-mails
  8. Including Job Output in E-mails
  9. Job Arrays
  10. GECOS Field Usage
  11. Development
  12. Linting
  13. Testing
  14. Upgrading from Slurm-Mail version 3 to 4
  15. Upgrading from Slurm-Mail version 4.0-4.9 to 4.10
  16. Troubleshooting
  17. Contributors

Introduction

E-mail notifications from Slurm are rather brief and all the information is contained in the subject of the e-mail - the body is empty.

Slurm-Mail aims to address this by providing a drop in replacement for Slurm's e-mails to give users much more information about their jobs via HTML e-mails which contain the following information:

  • Start/End
  • Job name
  • Partition
  • Work dir
  • Elapsed time
  • Exit code
  • Std out file path
  • Std err file path
  • No. of nodes used
  • Node list
  • Requested memory per node
  • Maximum memory usage per node
  • CPU efficiency
  • Wallclock
  • Wallclock accuracy

E-mails can be easily customised to your needs using the provided templates (see below).

You can also opt to include a number of lines from the end of the job's output files in the job completion e-mails (see below).

Requirements

  • cron
  • logrotate
  • Python 3.6 or newer
  • Slurm 22, 23 or 24
  • A working e-mail server

Note: earlier versions of Slurm may work but are not tested with this release of Slurm-Mail.

Installation

Amazon Linux, RedHat and SUSE Based Operating Systems

For each release of Slurm-Mail, RPMs for Amazon Linux 2, RedHat 7/8/9 and SUSE 15 based operating systems are provided at neilmunday.github.io/slurm-mail/repo.

Amazon Linux 2

sudo wget -O /etc/yum.repos.d/slurm-mail.repo https://neilmunday.github.io/slurm-mail/repo/slurm-mail.amnz2.repo
sudo yum install slurm-mail

Amazon Linux 2023

sudo wget -O /etc/yum.repos.d/slurm-mail.repo https://neilmunday.github.io/slurm-mail/repo/slurm-mail.amnz2023.repo
sudo dnf install slurm-mail

RedHat 7 / CentOS 7

sudo wget -O /etc/yum.repos.d/slurm-mail.repo https://neilmunday.github.io/slurm-mail/repo/slurm-mail.el7.repo
sudo yum install slurm-mail

RedHat 8 / Rocky Linux 8

sudo wget -O /etc/yum.repos.d/slurm-mail.repo https://neilmunday.github.io/slurm-mail/repo/slurm-mail.el8.repo
sudo dnf install slurm-mail

RedHat 9 / Rocky Linux 9

sudo wget -O /etc/yum.repos.d/slurm-mail.repo https://neilmunday.github.io/slurm-mail/repo/slurm-mail.el9.repo
sudo dnf install slurm-mail

OpenSUSE 15 / SLES 15

sudo zypper addrepo --no-gpgcheck  --refresh https://neilmunday.github.io/slurm-mail/repo/sl15 slurm-mail
sudo zypper install slurm-mail

Other RPM Based Operating Systems

For other operating systems that use RPM packages you can create a package for your OS like so:

dnf -y install python36 rpm-build tar
git clone https://github.com/neilmunday/slurm-mail
slurm-mail/build-tools/build-rpm.sh

The RPM will be written to ~/rpmbuild/RPMS/noarch.

Ubuntu 20 and 22

Pre-built Ubuntu 20 and 22 packages are provided at neilmunday.github.io/slurm-mail/repo.

Add to your /etc/apt/sources.list file the following line depending on your OS.

Ubuntu 20

deb [trusted=yes] https://neilmunday.github.io/slurm-mail/repo/ub20 ./

Ubuntu22

deb [trusted=yes] https://neilmunday.github.io/slurm-mail/repo/ub22 ./

Ubuntu24

deb [trusted=yes] https://neilmunday.github.io/slurm-mail/repo/ub24 ./

Install

Then install using apt-get:

apt-get install slurm-mail

Other Debian based Operating Systems

For other Debian variants you can create a package for your OS like so:

sudo apt-get install -y fakeroot dh-python lintian lsb-release python3 python3-stdeb
git clone https://github.com/neilmunday/slurm-mail
slurm-mail/build-tools/build-deb.sh

At the end of the execution the location of the built package will be written to stdout.

To install the generated package for example:

apt-get install -y cron logrotate python3 slurm-client
dpkg --force-all -i /tmp/slurm-mail_4.3-ubuntu1_all.deb

From source (as root)

git clone https://github.com/neilmunday/slurm-mail
cd slurm-mail
python setup.py install
cp etc/logrotate.d/slurm-mail /etc/logrotate.d/
cp etc/cron.d/slurm-mail /etc/cron.d/
install -d -m 700 -o slurm -g slurm /var/log/slurm-mail

Note: Depending on your operating system's Python set-up, it is possible that setuptools might install Slurm-Mail to /usr/local rather than /usr.

AWS Parallel Cluster

Slurm-Mail can be installed and automatically configured to work with AWS Parallel Cluster by using the this recipe.

Configuration

Edit /etc/slurm-mail/slurm-mail.conf to suit your needs. For example, check that the location of sacct is correct. If you are installing from source check that the log and spool directories are set to your desired values.

Change the value of MailProg in your slurm.conf file to /usr/bin/slurm-spool-mail. By default the Slurm config file will be located at /etc/slurm/slurm.conf.

Restart slurmctld:

systemctl restart slurmctld

Slurm-Mail will now log e-mail requests from Slurm users to the Slurm-Mail spool directory /var/spool/slurm-mail.

The cron job created during installation at /etc/cron.d/slurm-mail will execute once per minute to process the spool files, thus making sure that slurmctld is not blocked by processing e-mails.

SMTP Settings

By default Slurm-Mail will send e-mails to a mail server running on the same host as Slurm-Mail is installed on, i.e. localhost.

You can edit the smtp configuration options in /etc/slurm-mail/slurm-mail.conf. For example, to send e-mails via Gmail's SMTP server set the following settings:

smtpServer = smtp.gmail.com
smtpPort = 587
smtpUseTls = yes
smtpUserName = [email protected]
smtpPassword = your_gmail_password

NOTE: As this file will contain your Gmail password make sure that it has the correct owner, group and file access permissions.

If your SMTP server does not require a login, leave smtpUserName and smtpPassword set to null, i.e.

smtpUserName =
smtpPassword =

For SMTP servers that use SSL rather than starttls please set smtpUseSsl = yes.

E-mail retries

By default Slurm-Mail will attempt to resend e-mails when a previous attempt failed. This can result in repeated failed e-mail attempts if for example a user has specified an invalid e-mail address.

If you would prefer to disable this feature, set the following option in /etc/slurm-mail/slurm-mail.conf:

retryOnFailure = no

In either case, errors for failed e-mail delivery will always be logged in /var/log/slurm-mail/slurm-send-mail.log

E-mail headers

To add additional e-mail headers to outgoing e-mails please set the emailHeaders option in /etc/slurm-mail/slurm-mail.conf

Customising E-mails

Templates

Slurm-Mail uses Python's string.Template class to create the e-mails it sends. Under Slurm-Mail's /etc/slurm-mail/templates directory you will find the following files that you can edit to customise e-mails to your needs under the html and text directories. Templates under the html directory are used for HTML e-mails and the templates under text are used for plain text e-mails.

Filename Template Purpose
ended-array.tpl Used for jobs in an array that have finished.
ended-array_summary.tpl Used when all jobs in an array have finished.
ended-hetjob.tpl Used for the leader job in a heterogeneous job that has ended.
ended.tpl Used for jobs that have finished.
invalid-dependency.tpl Used when a job has an invalid dependency.
job-table.tpl Used to create the job info table in e-mails.
never-ran.tpl Used for jobs that never ran.
signature.tpl Used to create the e-mail signature.
staged-out.tpl Used when a job's burst buffer stage has completed.
started.tpl Used for jobs that have started.
started-hetjob.tpl Used for the leader job in a heterogeneous job that has started.
started-array-summary.tpl Used when the first job in an array has started.
started-array.tpl Used for the first job in an array that has started.
time.tpl Used when a job reaches a percentage of its time limit.
tres.tpl Used to add trackable resources (TRES) information to e-mails.

Each template has a number of variables which can be used in the generation of e-mails. Please see TEMPLATES for futher details.

Styling

You can adjust the font style, size, colours etc. by editing the Cascading Style Sheet (CSS) file /etc/slurm-mail/style.css used for generating the e-mails.

Date/time format

To change the date/time format used for job start and end times in the e-mails, change the datetimeFormat configuration option in /etc/slurm-mail/slurm-mail.conf. The format string used is the same as Python's datetime.strftime function.

E-mail subject

To change the subject of the e-mails, change the emailSubject configuration option in /etc/slurm-mail. You use the following place holders in the string:

Place holder Value
$CLUSTER The name of the cluster
$JOB_NAME The name of the job
$JOB_ID The Slurm ID of the job
$STATE The state of the job

Validating E-mails

By default Slurm-Mail will not perform any checks on the destination e-mail address (i.e the value supplied to sbatch via --mail-user). If you would like Slurm-Mail to only send e-mails for jobs that correspond to a valid e-mail address (e.g. [email protected]) then you can set the validateEmail option in /etc/slurm-mail/slurm-mail.conf to true. E-mail addresses that failed this check will be logged in /var/log/slurm-mail/slurm-send-mail.log as an error.

The regular expression used to validate e-mail addresses can be configured by adjusting the emailRegEx value in /etc/slurm-mail/slurm-mail.conf.

Including Job Output in E-mails

In /etc/slurm-mail/slurm-mail.conf you can set the includeOutputLines to the number of lines to include from the end of each job's standard out and standard error files.

Notes:

  • if the user has decided to use the same file for both standard output and standard error then there will be only one section of job output in the job completion e-mails.
  • Job output can only be included if the process that is running slurm-send-mail.py is able to read the user's output files.
  • Due to the way scontrol reports filenames that use Slurm's filename patterns only these patterns are supported when including job output in e-mails: %A, %a, %j, %u, and %x.

Job Arrays

Slurm-Mail will honour the behaviour of --mail-type option of sbatch for job arrays. If a user specifies --mail-type=ARRAY_TASKS then Slurm-Mail will send notification e-mails for all jobs in the array. If you want to limit the number of e-mails that will be sent in this scenario then change the arrayMaxNotifications parameter in slurm-mail.conf to a value greater than zero.

GECOS Field Usage

Slurm-Mail uses the GECOS field of a user's passwd entry to determine their real name to use in e-mails. Slurm-Mail will split the GECOS field by the comma character and will by default use the first (zeroth) element. If your system is set-up to use a different element for the user's real name then you can change the gecosNameField parameter in slurm-mail.conf to your desired value.

For example, if your GECOS uses the format Last name, first name you can set gecosNameField to 1 instead of 0.

Development

Clone this repository to your desktop and make sure you have a supported version of Python installed (see Requirements above).

Pull requests are welcome!

A VSCode settings file is provided in this repository and is configured to allow you to run unit tests from the GUI.

Linting

pylint is used to check that the Slurm-Mail Python source code is formatted correctly.

pip3 install --user pylint

Once install you can run pylint against the Slurm-Mail source code:

pylint setup.py src/slurmmail/*.py tests/unit/*.py tests/integration/docker-slurm/*.py tests/integration/*.py

Testing

In order to run the unit tests you will need to install pytest, e.g.

pip3 install --user pytest

The unit tests can be found at tests/unit and can be invoked from either VSCode or from the command line, e.g.

pytest

Integration tests can be found at tests/integration which also contains a demo.sh script which allows you to experiment with a demo of Slurm-Mail complete with MailHog as a working mail server and webmail client.

Upgrading from Slurm-Mail version 3 to 4

Version 4.0 onwards no longer installs to the /opt/slurm-mail. Instead from version 4.0 onwards install using Python's setuptools module. If your current Slurm-Mail 3.x installation was installed by your operating system's package manager, then you can just upgrade your installation using your package manager (e.g. yum, dnf).

If not, then proceed as follows:

  1. install Slurm-Mail 4 using one of the methods describe above - take note of where the scripts and config files are installed (e.g. /usr/bin and /etc).

  2. If you have not modified any template files you can skip this step.

cp /opt/slurm-mail/conf.d/templates/* /etc/slurm-mail/templates/html/

Now check the contents of the plain text templates located at /etc/slurm-mail/templates/text and adjust as necessary to meet your requirements.

  1. If you have not modified Slurm-Mail's style.css file you can skip this step.
cp /opt/slurm-mail/conf.d/style.css /etc/slurm-mail/
  1. If you have not modified slurm-mail.conf you can skip this step.
cp /opt/slurm-mail/conf.d/slurm-mail.conf /etc/slurm-mail/
  1. Update your Slurm installation's slurm.conf file:
MailProg=/usr/bin/slurm-spool-mail

Now restart slurmctld:

systemctl restart slurmctld
  1. Edit /etc/cron.d/slurm-mail and set the path of the executable to: /usr/bin/slurm-send-mail

  2. If you are sure you will no longer need the old version:

rm -rf /opt/slurm-mail

Upgrading from Slurm-Mail version 4.0-4.9 to 4.10

Starting from version 4.10, HTML and plain text e-mail templates are provided. If you have adjusted any templates please copy them from /etc/slurm-mail/templates to /etc/slurm-mail/templates/html and adjust any of the plain text templates as necessary under /etc/slurm-mail/templates/text.

Troubleshooting

  1. Check that spool files are being created under: /var/spool/slurm-mail. If they are not:
  • check cron is working
  • check for invocation of /usr/bin/slurm-spool-mail in the slurmctld logs
  1. If spool files are being created but not purged please comment out logFile in the [slurm-send-mail] section in /etc/slurm-mail/slurm-mail.conf and run (as root): /usr/bin/slurm-send-mail -v at the console.

  2. If /usr/bin/slurm-send-mail is executing ok but you are not receiving e-mails, then check the mail logs on your server for any mail delivery errors.

Contributors

Thank you to the following people who have contributed code improvements, features, found bugs and aided the development of Slurm-Mail: