-
Notifications
You must be signed in to change notification settings - Fork 7
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Update the configure-eda.sh script to install rootless docker for compute nodes. Add playbook and script to configure users to run rootless docker. Resolves #29
- Loading branch information
Showing
18 changed files
with
445 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,109 @@ | ||
# Containers | ||
|
||
Slurm supports [running jobs in unprivileged OCI containers](https://slurm.schedmd.com/containers.html). | ||
OCI is the [Open Container Initiative](https://opencontainers.org/), an open governance structure with the purpose of creating open industry standards around container formats and runtimes. | ||
|
||
I'm going to document how to add OCI support to your EDA Slurm cluster. | ||
Note that most EDA tools are not containerized and that some won't run in containers and that some may run in a container, but not correctly. | ||
I recommend following the guidance of your EDA vendor and consult with them. | ||
|
||
I've seen a couple of main motivations for using containers for EDA tools. | ||
The first is because orchestration tools like Kubernetes and AWS Batch require jobs to run in containers. | ||
The other is to have more flexibility managing the run time environment of the tools. | ||
Since the EDA tools themselves aren't containerized, the container is usually used to manage file system mounts and packages that are used by the tools. | ||
If new packages are required by a new tool, then it is easy to update and distribute a new version of the container. | ||
|
||
## Compute node configuration | ||
|
||
The compute node must be configured to use an unprivileged container runtime. | ||
We'll show how to install and configure rootless Docker. | ||
|
||
The following directions have been automated in the [creation of a custom EDA compute node AMI](custom-amis.md). | ||
|
||
First, [install the latest Docker from the Docker yum repo](https://docs.docker.com/engine/install/rhel/). | ||
|
||
Next, [configure Docker to run rootless](https://docs.docker.com/engine/security/rootless/). | ||
|
||
Configure subuid and subgid. | ||
|
||
Each user that will run Docker must have an entry in `/etc/subuid` and `/etc/subgid`. | ||
|
||
## Per user configuration | ||
|
||
You must configure docker to use a non-NFS storage location for storing images. | ||
|
||
`~/.config/docker/daemon.json`: | ||
|
||
``` | ||
{ | ||
"data-root": "/var/tmp/${USER}/containers/storage" | ||
} | ||
``` | ||
|
||
## Create OCI Bundle | ||
|
||
Each container requires an [OCI bundle](https://slurm.schedmd.com/containers.html#bundle). | ||
|
||
The bundle directories can be stored on NFS and shared between users. | ||
For example, you could create an oci-bundles directory on your shared file system. | ||
|
||
This shows how to create an ubuntu bundle. | ||
You can do this as root with the docker service running, but it would be better to run | ||
it using rootless Docker. | ||
|
||
``` | ||
export OCI_BUNDLES_DIR=~/oci-bundles | ||
export IMAGE_NAME=ubuntu | ||
export BUNDLE_NAME=ubuntu | ||
mkdir -p $OCI_BUNDLES_DIR | ||
cd $OCI_BUNDLES_DIR | ||
mkdir -p $BUNDLE_NAME | ||
cd $BUNDLE_NAME | ||
docker pull $IMAGE_NAME | ||
docker export $(docker create $IMAGE_NAME) > $BUNDLE_NAME.tar | ||
mkdir rootfs | ||
tar -C rootfs -xf $IMAGE_NAME.tar | ||
runc spec --rootless | ||
runc run containerid | ||
``` | ||
|
||
The same process works for Rocky Linux 8. | ||
|
||
``` | ||
export OCI_BUNDLES_DIR=~/oci-bundles | ||
export IMAGE_NAME=rockylinux:8 | ||
export BUNDLE_NAME=rockylinux8 | ||
mkdir -p $OCI_BUNDLES_DIR | ||
cd $OCI_BUNDLES_DIR | ||
mkdir -p $BUNDLE_NAME | ||
cd $BUNDLE_NAME | ||
docker pull $IMAGE_NAME | ||
docker export $(docker create $IMAGE_NAME) > $BUNDLE_NAME.tar | ||
mkdir rootfs | ||
tar -C rootfs -xf $BUNDLE_NAME.tar | ||
runc spec --rootless | ||
runc run containerid2 | ||
``` | ||
|
||
## Test the bundle locally | ||
|
||
``` | ||
export OCI_BUNDLES_DIR=~/oci-bundles | ||
export BUNDLE_NAME=rockylinux8 | ||
cd $OCI_BUNDLES_DIR/$BUNDLE_NAME | ||
runc spec --rootless | ||
runc run containerid2 | ||
``` | ||
|
||
## Run a bundle on Slurm | ||
|
||
``` | ||
export OCI_BUNDLES_DIR=~/oci-bundles | ||
export BUNDLE_NAME=rockylinux8 | ||
srun -p interactive --container $OCI_BUNDLES_DIR/$BUNDLE_NAME --pty hostname | ||
srun -p interactive --container $OCI_BUNDLES_DIR/$BUNDLE_NAME --pty bash | ||
sbatch -p interactive --container $OCI_BUNDLES_DIR/$BUNDLE_NAME --wrap hostname | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
86 changes: 86 additions & 0 deletions
86
source/resources/parallel-cluster/config/bin/configure-rootless-docker.sh
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,86 @@ | ||
#!/bin/bash -ex | ||
# Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. | ||
# SPDX-License-Identifier: MIT-0 | ||
|
||
# Configure rootless docker for user. | ||
# The slurm config directory must exist | ||
|
||
script=$0 | ||
script_name=$(basename $script) | ||
|
||
# Jinja2 template variables | ||
assets_bucket={{assets_bucket}} | ||
assets_base_key={{assets_base_key}} | ||
export AWS_DEFAULT_REGION={{Region}} | ||
ClusterName={{ClusterName}} | ||
ErrorSnsTopicArn={{ErrorSnsTopicArn}} | ||
playbooks_s3_url={{playbooks_s3_url}} | ||
|
||
# Notify user of errors | ||
function on_exit { | ||
rc=$? | ||
set +e | ||
if [[ $rc -ne 0 ]] && [[ ":$ErrorSnsTopicArn" != ":" ]]; then | ||
tmpfile=$(mktemp) | ||
echo "See log files for more info: | ||
/var/lib/amazon/toe/TOE_* | ||
grep PCImageBuilderEDA /var/log/messages | less" > $tmpfile | ||
aws --region $AWS_DEFAULT_REGION sns publish --topic-arn $ErrorSnsTopicArn --subject "${ClusterName} configure-rootless-docker.sh failed" --message file://$tmpfile | ||
rm $tmpfile | ||
fi | ||
} | ||
trap on_exit EXIT | ||
|
||
# Redirect all IO to /var/log/messages and then echo to stderr | ||
exec 1> >(logger -s -t configure-rootless-docker) 2>&1 | ||
|
||
# Install ansible | ||
if ! yum list installed ansible &> /dev/null; then | ||
yum install -y ansible || amazon-linux-extras install -y ansible2 | ||
fi | ||
|
||
external_login_node_config_dir=/opt/slurm/${ClusterName}/config | ||
if [ -e $external_login_node_config_dir ]; then | ||
config_dir=$external_login_node_config_dir | ||
else | ||
config_dir=/opt/slurm/config | ||
fi | ||
config_bin_dir=$config_dir/bin | ||
ANSIBLE_PATH=$config_dir/ansible | ||
PLAYBOOKS_PATH=$ANSIBLE_PATH/playbooks | ||
PLAYBOOKS_ZIP_PATH=$ANSIBLE_PATH/playbooks.zip | ||
|
||
if ! [ -e $external_login_node_config_dir ]; then | ||
mkdir -p $config_bin_dir | ||
|
||
ansible_head_node_vars_yml_s3_url="s3://$assets_bucket/$assets_base_key/config/ansible/ansible_head_node_vars.yml" | ||
ansible_compute_node_vars_yml_s3_url="s3://$assets_bucket/$assets_base_key/config/ansible/ansible_compute_node_vars.yml" | ||
ansible_external_login_node_vars_yml_s3_url="s3://$assets_bucket/$assets_base_key/config/ansible/ansible_external_login_node_vars.yml" | ||
|
||
# Download ansible playbooks | ||
aws s3 cp $playbooks_s3_url ${PLAYBOOKS_ZIP_PATH}.new | ||
if ! [ -e $PLAYBOOKS_ZIP_PATH ] || ! diff -q $PLAYBOOKS_ZIP_PATH ${PLAYBOOKS_ZIP_PATH}.new; then | ||
mv $PLAYBOOKS_ZIP_PATH.new $PLAYBOOKS_ZIP_PATH | ||
rm -rf $PLAYBOOKS_PATH | ||
mkdir -p $PLAYBOOKS_PATH | ||
pushd $PLAYBOOKS_PATH | ||
yum -y install unzip | ||
unzip $PLAYBOOKS_ZIP_PATH | ||
chmod -R 0700 $ANSIBLE_PATH | ||
popd | ||
fi | ||
|
||
aws s3 cp $ansible_head_node_vars_yml_s3_url /opt/slurm/config/ansible/ansible_head_node_vars.yml | ||
|
||
aws s3 cp $ansible_compute_node_vars_yml_s3_url /opt/slurm/config/ansible/ansible_compute_node_vars.yml | ||
|
||
aws s3 cp $ansible_external_login_node_vars_yml_s3_url /opt/slurm/config/ansible/ansible_external_login_node_vars.yml | ||
fi | ||
|
||
pushd $PLAYBOOKS_PATH | ||
|
||
ansible-playbook $PLAYBOOKS_PATH/configure-rootless-docker.yml \ | ||
-i inventories/local.yml \ | ||
-e @$ANSIBLE_PATH/ansible_external_login_node_vars.yml | ||
|
||
popd |
91 changes: 91 additions & 0 deletions
91
source/resources/parallel-cluster/config/bin/install-rootless-docker.sh
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,91 @@ | ||
#!/bin/bash -ex | ||
# Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. | ||
# SPDX-License-Identifier: MIT-0 | ||
|
||
# This script calls an ansible playbook that installs rootless docker on a compute node. | ||
# It has 2 different use cases: | ||
# * To build ParallelCluster AMIs | ||
# * To install docker on VDIs or other login nodes using a ParallellCluster. | ||
# The location of the config directory is different for those 2 use cases. | ||
# For an AMI build, the config directory and scripts will not exist and must be downloaded from S3. | ||
# For a login node, the playbooks and scripts will already exist. | ||
|
||
script=$0 | ||
script_name=$(basename $script) | ||
|
||
# Jinja2 template variables | ||
assets_bucket={{assets_bucket}} | ||
assets_base_key={{assets_base_key}} | ||
export AWS_DEFAULT_REGION={{Region}} | ||
ClusterName={{ClusterName}} | ||
ErrorSnsTopicArn={{ErrorSnsTopicArn}} | ||
playbooks_s3_url={{playbooks_s3_url}} | ||
|
||
# Notify user of errors | ||
function on_exit { | ||
rc=$? | ||
set +e | ||
if [[ $rc -ne 0 ]] && [[ ":$ErrorSnsTopicArn" != ":" ]]; then | ||
tmpfile=$(mktemp) | ||
echo "See log files for more info: | ||
/var/lib/amazon/toe/TOE_* | ||
grep PCImageBuilderEDA /var/log/messages | less" > $tmpfile | ||
aws --region $AWS_DEFAULT_REGION sns publish --topic-arn $ErrorSnsTopicArn --subject "${ClusterName} install-rootless-docker.sh failed" --message file://$tmpfile | ||
rm $tmpfile | ||
fi | ||
} | ||
trap on_exit EXIT | ||
|
||
# Redirect all IO to /var/log/messages and then echo to stderr | ||
exec 1> >(logger -s -t install-rootless-docker) 2>&1 | ||
|
||
# Install ansible | ||
if ! yum list installed ansible &> /dev/null; then | ||
yum install -y ansible || amazon-linux-extras install -y ansible2 | ||
fi | ||
|
||
external_login_node_config_dir=/opt/slurm/${ClusterName}/config | ||
if [ -e $external_login_node_config_dir ]; then | ||
config_dir=$external_login_node_config_dir | ||
else | ||
config_dir=/opt/slurm/config | ||
fi | ||
config_bin_dir=$config_dir/bin | ||
ANSIBLE_PATH=$config_dir/ansible | ||
PLAYBOOKS_PATH=$ANSIBLE_PATH/playbooks | ||
PLAYBOOKS_ZIP_PATH=$ANSIBLE_PATH/playbooks.zip | ||
|
||
if ! [ -e $external_login_node_config_dir ]; then | ||
mkdir -p $config_bin_dir | ||
|
||
ansible_head_node_vars_yml_s3_url="s3://$assets_bucket/$assets_base_key/config/ansible/ansible_head_node_vars.yml" | ||
ansible_compute_node_vars_yml_s3_url="s3://$assets_bucket/$assets_base_key/config/ansible/ansible_compute_node_vars.yml" | ||
ansible_external_login_node_vars_yml_s3_url="s3://$assets_bucket/$assets_base_key/config/ansible/ansible_external_login_node_vars.yml" | ||
|
||
# Download ansible playbooks | ||
aws s3 cp $playbooks_s3_url ${PLAYBOOKS_ZIP_PATH}.new | ||
if ! [ -e $PLAYBOOKS_ZIP_PATH ] || ! diff -q $PLAYBOOKS_ZIP_PATH ${PLAYBOOKS_ZIP_PATH}.new; then | ||
mv $PLAYBOOKS_ZIP_PATH.new $PLAYBOOKS_ZIP_PATH | ||
rm -rf $PLAYBOOKS_PATH | ||
mkdir -p $PLAYBOOKS_PATH | ||
pushd $PLAYBOOKS_PATH | ||
yum -y install unzip | ||
unzip $PLAYBOOKS_ZIP_PATH | ||
chmod -R 0700 $ANSIBLE_PATH | ||
popd | ||
fi | ||
|
||
aws s3 cp $ansible_head_node_vars_yml_s3_url /opt/slurm/config/ansible/ansible_head_node_vars.yml | ||
|
||
aws s3 cp $ansible_compute_node_vars_yml_s3_url /opt/slurm/config/ansible/ansible_compute_node_vars.yml | ||
|
||
aws s3 cp $ansible_external_login_node_vars_yml_s3_url /opt/slurm/config/ansible/ansible_external_login_node_vars.yml | ||
fi | ||
|
||
pushd $PLAYBOOKS_PATH | ||
|
||
ansible-playbook $PLAYBOOKS_PATH/install-rootless-docker.yml \ | ||
-i inventories/local.yml \ | ||
-e @$ANSIBLE_PATH/ansible_compute_node_vars.yml | ||
|
||
popd |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -10,3 +10,4 @@ | |
- security_updates | ||
- bug_fixes | ||
- ParallelClusterComputeNode | ||
- install-rootless-docker |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
--- | ||
- name: Configure rootless docker for user | ||
hosts: | ||
- ExternalLoginNode | ||
become_user: root | ||
become: yes | ||
roles: | ||
- configure-rootless-docker |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
--- | ||
- name: Install rootless docker for OCI containers | ||
hosts: | ||
- ParallelClusterComputeNode | ||
become_user: root | ||
become: yes | ||
roles: | ||
- install-rootless-docker |
6 changes: 6 additions & 0 deletions
6
source/resources/playbooks/roles/ParallelClusterHeadNode/files/opt/slurm/etc/oci.conf
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
EnvExclude="^(SLURM_CONF|SLURM_CONF_SERVER)=" | ||
RunTimeEnvExclude="^(SLURM_CONF|SLURM_CONF_SERVER)=" | ||
RunTimeQuery="runc --rootless=true --root=/run/user/%U/ state %n.%u.%j.%s.%t" | ||
RunTimeKill="runc --rootless=true --root=/run/user/%U/ kill -a %n.%u.%j.%s.%t" | ||
RunTimeDelete="runc --rootless=true --root=/run/user/%U/ delete --force %n.%u.%j.%s.%t" | ||
RunTimeRun="runc --rootless=true --root=/run/user/%U/ run %n.%u.%j.%s.%t -b |
Oops, something went wrong.