-
Notifications
You must be signed in to change notification settings - Fork 4
AWS
Evan Carlin edited this page May 4, 2022
·
35 revisions
From console:
- Launch Instance
- AWS Marketplace
- Search "centos"
- CentOS 7 (x86_64) - with Updates HVM
- Root partition: 10g, encrypted
- Security group: public-ssh
- Launch
- Use existing key
Once booted, get public and private IPs:
- Add IPs to named
- create a record in
$net_conf_init
with the internet and backnet ip addresses (public and private ip addresses in AWS). You'll want to create an elastic ip address through AWS otherwise the public ip changes each time the machine reboots. - you'll want to create a record in
$host_conf
that associates the names you created in the previous step with the desired hostnames. For example, if in the previous step you createdaws_foo6i 'x.x.x.234/32'
then here you'll createfoo6i => ['aws_m503i', 234],
. Again, do this for both internet and backnet - Finally, associate the names with a domain. Find the domain you want under the "zones"
key in the NamedConf hash (ex bar.com). Under the
$ipv4_map
add the name of the server (exfoo6i
).
- create a record in
- Setup host with
rsconf_db.components: [ docker ]
bash /srv/rsconf/aws-init.sh <ip>
Proceed with post installation instructions.
# need kernel source which is always the latest so do update first
yum update -y
yum install -y kernel-devel
# if new kernel, then
reboot
yum remove $(rpm -qa | grep ^kernel-3 | grep -v $(uname -r))
yum install -y https://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/cuda-repo-rhel7-10.1.243-1.x86_64.rpm
# needs to be installed manually:
# http://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/yum-plugin-nvidia-0.5-1.el7.noarch.rpm: [Errno -1] Package does not match intended download.
yum install -y http://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/yum-plugin-nvidia-0.5-1.el7.noarch.rpm
yum install -y cuda-drivers
reboot
nvidia-smi
curl -s -L https://nvidia.github.io/nvidia-container-runtime/centos7/nvidia-container-runtime.repo \
| install -m 444 /dev/stdin /etc/yum.repos.d/nvidia-container-runtime.repo
yum install -y nvidia-container-runtime
systemctl restart docker
For cuda 11.4 (version IndeX currently requires): https://developer.nvidia.com/cuda-11-4-4-download-archive?target_os=Linux&target_arch=x86_64&Distribution=CentOS&target_version=7&target_type=rpm_local
# need kernel source which is always the latest so do update first
yum update -y
yum install -y kernel-devel
# if new kernel, then
reboot
yum remove $(rpm -qa | grep ^kernel-3 | grep -v $(uname -r))
wget https://developer.download.nvidia.com/compute/cuda/11.4.4/local_installers/cuda-repo-rhel7-11-4-local-11.4.4_470.82.01-1.x86_64.rpm
rpm -i cuda-repo-rhel7-11-4-local-11.4.4_470.82.01-1.x86_64.rpm
yum clean all
yum install -y nvidia-driver-latest-dkms cuda
yum -y install cuda-drivers
reboot
nvidia-smi # verify it says CUDA 11.4
# Now setup nvidia-container-toolkit. It is confusing which packages are needed
# nvidia-docker2, nvidia-container-tookit, and/or nvidia-container-runtime.
# This comment clears up the confusion https://github.com/NVIDIA/nvidia-docker/issues/1268#issuecomment-632692949 .
# We just need nvidia-container-toolkit since we don't use Kubernetes and have Docker > 19.03
curl -s -L https://nvidia.github.io/libnvidia-container/centos7/libnvidia-container.repo \
| install -m 444 /dev/stdin /etc/yum.repos.d/nvidia-container-toolkit.repo
yum clean expire-cache
yum install -y nvidia-container-toolkit
systemctl restart docker
docker run --rm --gpus all nvidia/cuda:11.4.0-base nvidia-smi
docker image rm nvidia/cuda:11.4.0-base
docker run -it --gpus=all --net=host --rm tensorflow/tensorflow:latest-gpu python -c 'import tensorflow; tensorflow.config.experimental.list_physical_devices("gpu")'
<snip>
Found device 0 with properties:
name: Tesla T4 major: 7 minor: 5 memoryClockRate(GHz): 1.59
pciBusID: 0000:00:1e.0
<snip>
You have to convert the ssh private key to PEM format then DER format, and finally compute md5:
ssh-keygen -e -m PEM -f ~/.ssh/id_rsa | openssl rsa -RSAPublicKey_in -outform DER | openssl md5 -c