Skip to content
Rob Nagler edited this page Nov 20, 2019 · 35 revisions

Amazon Web Services (AWS)

Setup Instance

From console:

  • CentOS 7 (x86_64) - with Updates HVM
  • Root partition: 10g, encrypted
  • Security group: public-ssh
  • Launch
  • Use existing key

Once booted, get public and private IPs:

  • Add IPs to named
  • Setup host with rsconf_db.components: [ docker ]
  • run /srv/rsconf/aws-init.sh <ip>

Proceed with post installation instructions.

GPU Driver install

# need kernel source which is always the latest so do update first
yum update -y
yum install -y kernel-devel
# if new kernel, then
reboot
yum remove $(rpm -qa | grep ^kernel-3 | grep -v $(uname -r))
yum install -y https://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/cuda-repo-rhel7-10.1.243-1.x86_64.rpm
# needs to be installed manually:
# http://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/yum-plugin-nvidia-0.5-1.el7.noarch.rpm: [Errno -1] Package does not match intended download.
yum install -y http://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/yum-plugin-nvidia-0.5-1.el7.noarch.rpm
yum install -y cuda-drivers
nvidia-smi
curl -s -L https://nvidia.github.io/nvidia-container-runtime/centos7/nvidia-container-runtime.repo \
    | install -m 444 /dev/stdin /etc/yum.repos.d/nvidia-container-runtime.repo
yum install -y nvidia-container-runtime
systemctl restart docker

Verify

docker run -it --gpus=all --net=host --rm tensorflow/tensorflow:latest-gpu python -c 'import tensorflow; tensorflow.config.experimental.list_physical_devices("gpu")'
<snip>
Found device 0 with properties:
name: Tesla T4 major: 7 minor: 5 memoryClockRate(GHz): 1.59
pciBusID: 0000:00:1e.0
<snip>

Docker Container Build

See https://github.com/NVIDIA/nvidia-docker/wiki

docker run -it --name=gpu -u root radiasoft/beamsim-jupyter:prod bash <<'EOF'
rpm -i https://developer.download.nvidia.com/compute/cuda/repos/fedora29/x86_64/cuda-repo-fedora29-10.1.243-1.x86_64.rpm
dnf install -y kmodtool kernel-devel
dnf install -y cuda-drivers
EOF
docker commit --change 'USER vagrant' --change 'CMD ["/home/vagrant/.radia-run/tini", "--", "/home/vagrant/.radia-run/start"]' gpu gpu
docker rm gpu

Verify ssh key fingerprint

You have to convert the ssh private key to PEM format then DER format, and finally compute md5:

ssh-keygen -e -m PEM -f ~/.ssh/id_rsa | openssl rsa -RSAPublicKey_in -outform DER | openssl md5 -c
Clone this wiki locally