Skip to content
Felix Abecassis edited this page Jul 31, 2020 · 11 revisions

Documentation

What is pyxis?

Pyxis is a SPANK plugin for the Slurm Workload Manager. It allows unprivileged cluster users to run containerized tasks through the srun command, providing an experience very similar to bare-metal jobs:

$ srun --container-image ubuntu:20.04 --pty bash

It is using the enroot container utility, and relies on the enroot system configuration for most of its behavior.

Why pyxis?

You can use enroot directly as a container runtime on your cluster. enroot is fully unprivileged, is highly customizable and does not use a separate daemon, and thus should be a good fit for HPC clusters. But using the pyxis plugin, which is directly integrated into Slurm, provides a few additional advantages.

Simple interface

  • The command-line arguments are added directly to srun. Users are already familiar with this command and just have to learn a few additional arguments, instead of a set of new CLI commands to import/create/start/remove a container.
  • The container runtime could be swapped with another one while keeping the same srun API for users. An early prototype of pyxis used LXC and provided the same command-line arguments.

Integration with Slurm

  • Since pyxis is integrated with Slurm, we can have all local tasks be part of the same container (shared filesystem and namespaces).
    This is challenging to achieve if using a container runtime not integrated with Slurm. srun -n 8 enroot start ubuntu will create 8 different containers. With pyxis, the equivalent command will create 8 processes, but within the same container: srun -n 8 --container-image ubuntu
  • We can add logic to simplify user workflows, such as translating Slurm environment variables to PyTorch environment variables, to seamlessly enable distributed PyTorch applications.
  • We can automatically cleanup containers when a job finishes, no need to add a custom Slurm epilog.

Presentations

Slurm User Group Meeting 2019: Slides
FOSDEM 2020: Slides, Video

Clone this wiki locally