AWS

Amazon Web Services (AWS)

Setup Instance

From console:

CentOS 7 (x86_64) - with Updates HVM
Root partition: 10g, encrypted
Security group: public-ssh
Launch
Use existing key

Once booted, get public and private IPs:

Add IPs to named
Setup host with rsconf_db.components: [ docker ]
run /srv/rsconf/aws-init.sh <ip>

Proceed with post installation instructions.

GPU Driver install

# need kernel source which is always the latest so do update first
yum update -y
yum install -y kernel-devel
# if new kernel, then
reboot
yum remove $(rpm -qa | grep ^kernel-3 | grep -v $(uname -r))
yum install -y https://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/cuda-repo-rhel7-10.1.243-1.x86_64.rpm
# needs to be installed manually:
# http://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/yum-plugin-nvidia-0.5-1.el7.noarch.rpm: [Errno -1] Package does not match intended download.
yum install -y http://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/yum-plugin-nvidia-0.5-1.el7.noarch.rpm
yum install -y cuda-drivers
nvidia-smi
curl -s -L https://nvidia.github.io/nvidia-container-runtime/centos7/nvidia-container-runtime.repo \
    | install -m 444 /dev/stdin /etc/yum.repos.d/nvidia-container-runtime.repo
yum install -y nvidia-container-runtime
systemctl restart docker

Verify

docker run -it --gpus=all --net=host --rm tensorflow/tensorflow:latest-gpu python -c 'import tensorflow; tensorflow.config.experimental.list_physical_devices("gpu")'
<snip>
Found device 0 with properties:
name: Tesla T4 major: 7 minor: 5 memoryClockRate(GHz): 1.59
pciBusID: 0000:00:1e.0
<snip>

Docker Container Build

See https://github.com/NVIDIA/nvidia-docker/wiki

docker run -it --name=gpu -u root radiasoft/beamsim-jupyter:prod bash <<'EOF'
rpm -i https://developer.download.nvidia.com/compute/cuda/repos/fedora29/x86_64/cuda-repo-fedora29-10.1.243-1.x86_64.rpm
dnf install -y kmodtool kernel-devel
dnf install -y cuda-drivers
EOF
docker commit --change 'USER vagrant' --change 'CMD ["/home/vagrant/.radia-run/tini", "--", "/home/vagrant/.radia-run/start"]' gpu gpu
docker rm gpu

Verify ssh key fingerprint

You have to convert the ssh private key to PEM format then DER format, and finally compute md5:

ssh-keygen -e -m PEM -f ~/.ssh/id_rsa | openssl rsa -RSAPublicKey_in -outform DER | openssl md5 -c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AWS

Amazon Web Services (AWS)

Setup Instance

GPU Driver install

Verify

Docker Container Build

Verify ssh key fingerprint

Clone this wiki locally