Merge branch 'main' into ol-fix-caddy

aleph-im · Feb 19, 2025 · 47c95e6 · 47c95e6
2 parents 5fa8980 + cacef0a
commit 47c95e6
Show file tree

Hide file tree

Showing 7 changed files with 280 additions and 3 deletions.
diff --git a/docs/assets/images/gpu/cli-create-gpu-instance.png b/docs/assets/images/gpu/cli-create-gpu-instance.png
diff --git a/docs/computing/gpu/index.md b/docs/computing/gpu/index.md
@@ -0,0 +1,127 @@
+# GPU Instance Creation and Configuration
+
+This section outlines the process of starting a GPU instance on the Aleph Network and configuring the GPU on it.
+
+The [aleph-client](https://github.com/aleph-im/aleph-client/) command-line tool is required.<br>
+See [CLI Reference](../../tools/aleph-client/usage.md) or use `--help` for a quick overview of a specific command.
+
+## Setup
+
+### Create an instance with GPU
+
+The CLI provides a streamlined command to create a GPU instance. You will be prompted to choose a specific GPU available on a compatible CRN (Compute Resource Node), where your instance will be deployed. Alternatively, you can create your GPU instance on [Twentysix Cloud](https://console.twentysix.cloud/).
+
+```shell
+aleph instance gpu
+```
+
+![cli-create-gpu-instance](../../assets/images/gpu/cli-create-gpu-instance.png)
+
+Your VM is now ready to use.
+
+### Retrieve VM Logs
+
+Monitor your VM's activity:
+
+```shell
+aleph instance logs <vm-hash>
+```
+
+### Access Your VM via SSH
+
+#### 1. **Find the Instance Details**
+
+- **Via CLI**:
+
+```shell
+aleph instance list
+```
+
+- **Via API**: Access the compute node's API at `https://<node-url>/about/executions/list`.
+
+#### 2. **Connect via SSH**:
+
+Use the retrieved IP address to SSH into your VM:
+
+```shell
+ssh <user>@<ip> [-i <path-to-ssh-key>]
+```
+
+- **Default Users**:
+  - Debian: `root`
+  - Ubuntu: `ubuntu`
+
+## Using the GPU inside your VM
+
+### Install NVIDIA Linux Drivers
+
+#### 1. **Installation**
+
+##### **Debian**
+
+```shell
+echo "deb http://deb.debian.org/debian/ bookworm main contrib non-free non-free-firmware" >> /etc/apt/sources.list
+apt update -y && apt upgrade -y && apt autoremove -y
+```
+
+> ℹ️ If prompted, press Enter for `Keep the local version currently installed`.
+
+Exit your vm and then reboot it with:
+
+```shell
+aleph instance reboot <vm-hash>
+```
+
+Re-connect to your VM via SSH and run the following:
+
+```shell
+apt install linux-headers-$(uname -r) nvidia-driver software-properties-common -y
+```
+
+##### **Ubuntu**
+
+```shell
+apt update -y && apt upgrade -y && apt autoremove -y
+apt install ubuntu-drivers-common --fix-missing -y
+ubuntu-drivers --gpgpu install
+```
+
+> ℹ️ If prompted, just press Enter.
+
+#### 2. **Verify the Installation**
+
+To simply check that the installation went well, run the following:
+
+```shell
+nvidia-smi
+```
+
+Alternatively, you can try to run [ollama](https://ollama.com/) using the GPU:
+
+```shell
+curl -fsSL https://ollama.com/install.sh | sh
+ollama run deepseek-r1:1.5b "Why use a decentralized cloud?" --verbose
+```
+
+### Install NVIDIA CUDA Toolkit
+
+Ensure you already have the NVIDIA drivers installed.
+
+#### 1. **Installation**
+
+The following commands are usable for both Debian and Ubuntu.
+
+```shell
+distrib=$(echo "$(lsb_release -si | tr '[:upper:]' '[:lower:]')$(lsb_release -sr | tr -d '.')")
+wget https://developer.download.nvidia.com/compute/cuda/repos/$distrib/x86_64/cuda-keyring_1.1-1_all.deb && dpkg -i cuda-keyring_1.1-1_all.deb && rm cuda-keyring_1.1-1_all.deb
+apt update -y && apt install cuda-toolkit -y
+echo "export PATH=/usr/local/cuda/bin:$PATH" >> ~/.bashrc && source ~/.bashrc
+```
+
+#### 2. **Verify the Installation**
+
+To simply check that the installation went well, run the following:
+
+```shell
+nvcc --version
+```
diff --git a/docs/faq.md b/docs/faq.md
@@ -104,7 +104,7 @@ Twentysix Cloud is a cross-chain cloud solution that provides scalable, high-per
 Purchase ALEPH v2 on platforms such as Coinbase, KuCoin, Gate, LATOKEN, MEXC, Uniswap (Ethereum), Pancakeswap (BNB Chain), TraderJoe (Avalanche), and Raydium (Solana).
 
 ### What is Pay-As-You-Go?
-Pay-As-You-Go allows you to pay for resources as you use them on the Aleph.im network, eliminating the need to hold or stake large amounts of $ALEPH. This is currently available only on the Avalanche c-chain.
+Pay-As-You-Go allows you to pay for resources as you use them on the Aleph.im network, eliminating the need to hold or stake large amounts of $ALEPH. This is currently available  on BASE and Avalanche c-chain. More chains will be supported in the future, refer to the [supported chains](protocol/chains.md) table.
 
 ## III. Node Operators
 

diff --git a/docs/nodes/compute/advanced/enable-confidential.md b/docs/nodes/compute/advanced/enable-confidential.md
@@ -1,4 +1,4 @@
-# Confidential computing
+# Enable Confidential computing
 
 This guide outlines how to enable [Confidential Computing](../../../computing/confidential/index.md) on a CRN.
 

diff --git a/docs/nodes/compute/advanced/enable-gpu.md b/docs/nodes/compute/advanced/enable-gpu.md
@@ -0,0 +1,125 @@
+# Enabling GPU support
+
+This guide outlines how to enable GPU support on your CRN, as to allow user do deploy Instance with GPUs.
+
+Enabling GPU support for Virtualization requires changing which Kernel module (driver) is handling the GPU card, as it need the special `vfio` module to works with qemu. Do note that it will make the GPU unavailable for other purposes.
+
+To activate this feature it is also required to enable PAYG support and IPv6 on your CRN, as this is the required payment method for GPU. Follow the steps at [Enable PAYG](enable-payg.md)
+
+## Device compatible with the feature
+
+At the time of writing the GPUs listed below are compatible with the feature, more will be added soon as they are tested and validated.
+
+### Standard GPUs
+
+| GPU Model    | vCPU | RAM   | vRAM  | Price approx    |
+|--------------|------|-------|-------|-----------------|
+| L40S         | 12   | 72 GB | 48 GB | 3.33 ALEPH/hour |
+| RTX 5090     | 8    | 48 GB | 36 GB | 2.24 ALEPH/hour |
+| RTX 4090     | 6    | 36 GB | 24 GB | 1.68 ALEPH/hour |
+| RTX 3090     | 4    | 24 GB | 24 GB | 1.12 ALEPH/hour |
+| RTX 4000 ADA | 3    | 18 GB | 20 GB | 0.84 ALEPH/hour |
+
+GPUs must be connected via PCIe 4.0 16x each.
+
+## Premium GPUs
+
+
+Datacenter grade GPUs.
+
+| GPU Model | vCPU | RAM    | vRAM  | Price approx     |
+|-----------|------|--------|-------|------------------|
+| H100      | 24   | 144 GB | 80 GB | 13.33 ALEPH/hour |
+| A100      | 16   | 96 GB  | 80 GB | 8.89  ALEPH/hour |
+
+## Switch Kernel modules for GPU
+It is possible to enable multiple GPUs on one CRN
+
+Execute these command as root, use `sudo` on Ubuntu system. 
+
+1. Edit initramfs to attach GPU to vfio:
+   `vi /etc/initramfs-tools/modules` and set the content to 
+```
+attach : vfio vfio_iommu_type1 vfio_virqfd vfio_pci ids=10de:27b0,10de:22bc
+```
+
+replacing the ids `10de:27b0,10de:22bc` with ids of the VGA card, there should be 2, the Video device and the GPU audio device.
+You can get them running `lspci -nvv` as root user. 
+
+2. Edit `/etc/modules` to ensure that lods GPU to vfio:
+`vi /etc/modules` and add
+```
+attach: vfio vfio_iommu_type1 vfio_pci ids=10de:27b0,10de:22bc
+```
+
+3. Modify nvidia drivers to load after the vfio:
+`vi /etc/modprobe.d/nvidia.conf`
+
+set
+```
+softdep nouveau pre: vfio-pci
+softdep nvidia pre: vfio-pci
+softdep nvidia* pre: vfio-pci
+```
+
+4. Get the know alias from the PCI device:
+`cat /sys/bus/pci/devices/0000:01:00.0/modalias`
+replace pci address  `0000:01:00.0` with the one for your GPU, you can get it using `lspci` 
+
+5. Configure vfio module to use that devices:
+`vi /etc/modprobe.d/vfio.conf`
+
+```
+blacklist nouveau
+blacklist snd_hda_intel
+alias pci:v000010DEd000027B0sv000010DEsd000016FAbc03sc00i00 vfio-pci
+options vfio-pci ids=10de:27b0,10de:22bc
+```
+
+6. Enable modprobe vfio-pci
+```
+modprobe vfio-pci
+```
+
+7 . Update initramfs image with your changes:
+```shell
+update-initramfs -u -k all
+```
+
+8. Finally Reboot the server
+```shell
+reboot
+```
+
+9. Confirm that the `vfio-pci` kernel module is used for the card
+```shell
+lspci -k
+```
+
+Should display something similar to
+```
+[...]
+01:00.0 VGA compatible controller: NVIDIA Corporation AD104GL [RTX 4000 SFF Ada Generation] (rev a1)
+	Subsystem: NVIDIA Corporation AD104GL [RTX 4000 SFF Ada Generation]
+	Kernel driver in use: vfio-pci
+	Kernel modules: nvidiafb, nouveau, nvidia_drm, nvidia
+01:00.1 Audio device: NVIDIA Corporation Device 22bc (rev a1)
+	Subsystem: NVIDIA Corporation Device 16fa
+	Kernel driver in use: vfio-pci
+	Kernel modules: snd_hda_intel
+[...]
+```
+
+10. Confirm that the GPU are listed and supported on the CRN index page
+No additional modification is needed inside the aleph-vm configuration, apart from [enabling PAYG](./enable-payg.md).
+
+Start your aleph-vm supervisor, open the index page and it should list your GPU
+```
+GPUs
+
+   • NVIDIA | AD104GL [RTX 4000 SFF Ada Generation] is compatible ✅
+```
+
+
+## Disabling support.
+To disable support for the GPU, revert the modification done to attach the kernel module to the GPU.
diff --git a/docs/nodes/compute/advanced/enable-payg.md b/docs/nodes/compute/advanced/enable-payg.md
@@ -0,0 +1,22 @@
+## Enable PAYG (Pay-As-You-Go)
+
+Pay-As-You-Go allows user to pay for resources as you use them on the Aleph.im network, eliminating the need to hold or
+stake large amounts of $ALEPH.
+
+The feature is currently available on BASE and Avalanche c-chain.
+
+This guide outlines how to enable [Confidential Computing](../../../computing/confidential/index.md) on a CRN.
+
+#### Configure the stream reward address
+
+1. Create an Avalanche (AVAX) or BASE wallet.
+2. Open the information of your CRN on the [aleph.im account page](https://account.aleph.im/) and enter the address in
+   the section named STREAM REWARD ADDRESS.
+
+3. Add the reward address inside the CRN configuration `/etc/aleph-vm/supervisor.env` in the form of:
+   ```
+   ALEPH_VM_PAYMENT_RECEIVER_ADDRESS="0x0000000000000000000000000000000000000000"
+   ```
+   Where `0x0000000000000000000000000000000000000000` is the address of your wallet.
+4. Restart the node with `systemctl restart aleph-vm-supervisor.service`
+5. Confirm that the address appears on the path `/status/config` on the CRN's URL/config
diff --git a/mkdocs.yml b/mkdocs.yml
@@ -74,8 +74,10 @@ nav:
         - 'Ubuntu 22.04': nodes/compute/installation/ubuntu-22.04.md
         - 'Ubuntu 24.04': nodes/compute/installation/ubuntu-24.04.md
         - 'Troubleshooting': nodes/compute/troubleshooting.md
-        - 'Confidential computing': nodes/compute/advanced/enable-confidential.md
         - 'Configure Caddy': nodes/compute/installation/configure-caddy.md
+        - 'Enable PAYG': nodes/compute/advanced/enable-payg.md
+        - 'Enable Confidential computing': nodes/compute/advanced/enable-confidential.md
+        - 'Enable GPU support': nodes/compute/advanced/enable-gpu.md
       - 'Releases': nodes/compute/releases.md
     - 'Reliability':
       - 'Introduction': nodes/reliability/index.md
@@ -125,6 +127,7 @@ nav:
           - 'Encrypted disk image': computing/confidential/encrypted-disk.md
           - 'Instance': computing/confidential/instance.md
           - 'Troubleshooting': computing/confidential/troubleshooting.md
+        - 'GPU instances': computing/gpu/index.md
         - 'Tutorials':
           - 'Testing microVMs': guides/testing_microvms.md
           - 'Update a program': guides/update_a_program.md