UL HPC Launcher scripts

-- mode: markdown; mode: auto-fill; fill-column: 80 -- README -- HPC @ UL

    Time-stamp: <Mer 2013-04-03 17:40 svarrette>

UL HPC Launcher scripts

Synopsis

This repository holds a set of launcher scripts to be used on the UL HPC platform. They are provided for users of the infrastructure to make their life easier (and hopefully more efficient) on the following typical workflows:

Embarrassingly parallel run for repetitive and/or multi-parametric jobs over a Java/C/C++/Ruby/Perl/Python/R script, corresponding (normally) to the following cases:
serial (or sequential) tasks having all similar duration, run on one node
serial (or sequential) tasks having varying durations, run on one node
serial (or sequential) tasks having varying durations, run on multiple nodes
MPI run on n processes (ex: HPL) with abstraction of the MPI stack, MPI script, option to compile the code etc.
MPI run on n process per node (ex: OSU Micro-benchmarks)

We propose here two types of contributions:

a set of bash scripts examples that users can use as a startup example to adapt for their own workflow
NOT YET IMPLEMENTED a more generic ruby script interfaced by a YAML configuration file which hold the specificity of each users.

General considerations

The UL HPC platform offers parallel computing resource, so it's important you make an efficient use of the computing nodes, even when processing serial jobs. In particular, you should avoid to submit purely serial jobs to the OAR queue as it would waste the computational power (11 out of 12 cores is you reserve one node on gaia for instance).

Running a bunch of serial tasks on a single node

A bad behaviour in this context is illustrated in bash/serial/NAIVE_AKA_BAD_launcher_serial.sh where you'll recognize a pattern you perhaps use in your own script:

 for i in `seq 1 ${NB_TASKS}`; do  
    ${TASK} $i
 done

If you're more familiar with UNIX, you can perhaps argue we can fork separate processes using the bash & (ampersand) builtin control operator and the wait command. This is illustrated in bash/serial/launcher_serial_ampersand.sh and corresponds to the following pattern:

 for i in `seq 1 ${NB_TASKS}`; do  
    ${TASK} $i &
 done 
 wait

This approach is straightforward and is sufficient assuming (1) you don't have a huge number of tasks to fork and (2) each tasks has the a similar duration. For all the other (serial) cases, an approach based on GNU parallel if more effective as it permits to easily and efficiently schedule batch of n tasks in parallel (-j n), where n typically stands for the number of cores of the nodes. This is illustrated in bash/serial/launcher_serial.sh and corresponds to the following pattern:

seq ${NB_TASKS} | parallel -u -j 12 ${TASK} {}

Not convinced you have interest to these approaches? Take a look at the following completion times performed on the chaos cluster for the task mytask.sh proposed in bash/serial/mytask.sh:

  +---------+---------------+--------+--------------+----------------------+-----------+
  | NB_TASK |    HOSTNAME   | #CORES |    TASK      |    APPROACH          |   TIME    | 
  +---------+---------------+--------+--------------+----------------------+-----------+
  |   24    | h-cluster1-32 |   12   | sleep {1-24} | Pure serial          | 5m0.483s  | 
  |   24    | h-cluster1-32 |   12   | sleep {1-24} | Ampersand + wait     | 0m24.141s |
  |   24    | h-cluster1-32 |   12   | sleep {1-24} | GNU Parallel (-j 12) | 0m36.404s |
  |   24    | h-cluster1-32 |   12   | sleep {1-24} | GNU Parallel (-j 24) | 0m24.257s |
  +---------+---------------+--------+--------------+----------------------+-----------+

The same benchmark performed for the sample argument file (see bash/serial/mytask.args.example) to perform tasks of similar duration:

  +---------+---------------+--------+---------+----------------------+-----------+
  | NB_TASK |    HOSTNAME   | #CORES | TASK    |    APPROACH          |   TIME    |
  +---------+---------------+--------+---------+----------------------+-----------+
  |   30    | h-cluster1-32 |   12   | sleep 2 | Pure serial          | 1m0.374s  |
  |   30    | h-cluster1-32 |   12   | sleep 2 | Ampersand + wait     | 0m2.217s  |
  |   30    | h-cluster1-32 |   12   | sleep 2 | GNU Parallel (-j 12) | 0m6.375s  |
  |   30    | h-cluster1-32 |   12   | sleep 2 | GNU Parallel (-j 24) | 0m4.255s  |
  +---------+---------------+--------+---------+----------------------+-----------+

GNU parallel

Resources:

Official documentation, crappy yet useful and full of concrete examples.
Wiki SciNet
Slides on GNU Parallel for Large Batches of Small Jobs
GNU Parallel and PBS
GNU parallel introduction
Using GNU Parallel to Package Multiple Jobs in a Single PBS Job

Running a bunch of serial tasks on more than a single node

If you have hundreds of serial tasks that you want to run concurrently and you reserved more than one nodes, then the approach above, while useful, would require tens of scripts to be submitted in separate OAR jobs (each of them reserving 1 full nodes).

It is also possible to use GNU parallel in this case, using the --sshlogin options (altered to use the oarsh connector). This is illustrated in the generic launcher proposed in `

Running MPI programs

You'll find an example of launcher script for MPI jobs in bash/MPI/mpi_launcher.sh. Examples of usage are proposed in examples/MPI/

Contributing to this repository

Pre-requisites

Git

You should become familiar (if not yet) with Git. Consider these resources:

git-flow

The Git branching model for this repository follows the guidelines of gitflow. In particular, the central repo (on github.com) holds two main branches with an infinite lifetime:

production: the production-ready benchmark data
devel: the main branch where the latest developments interviene. This is the default branch you get when you clone the repo.

Local repository setup

This repository is hosted on out GitHub. Once cloned, initiate the potential git submodules etc. by running:

$> cd launcher-scripts
$> make setup

Name		Name	Last commit message	Last commit date
Latest commit History 104 Commits
.submodules		.submodules
bash		bash
examples		examples
python		python
slurm		slurm
wrappers		wrappers
.gitmodules		.gitmodules
Makefile		Makefile
README.md		README.md
VERSION		VERSION
softwaretools.md		softwaretools.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

UL HPC Launcher scripts

Synopsis

General considerations

Running a bunch of serial tasks on a single node

GNU parallel

Running a bunch of serial tasks on more than a single node

Running MPI programs

Contributing to this repository

Pre-requisites

Git

git-flow

Local repository setup

About

Releases

Packages

Contributors 5

Languages

ULHPC/launcher-scripts

Folders and files

Latest commit

History

Repository files navigation

UL HPC Launcher scripts

Synopsis

General considerations

Running a bunch of serial tasks on a single node

GNU parallel

Running a bunch of serial tasks on more than a single node

Running MPI programs

Contributing to this repository

Pre-requisites

Git

git-flow

Local repository setup

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages