-
Notifications
You must be signed in to change notification settings - Fork 83
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
f6889ec
commit 12dac03
Showing
2 changed files
with
15 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
# Monitoring NVIDIA GPU Workloads | ||
|
||
GPUs play an integral part in data intensive workloads. The base infrastructure module of the Observability Accelerator proivdes the ability to deploy the NVIDIA DCGM Exporter Dashboard | ||
The dashboard utilizes metrics scraped from the '/metrics' endpoint that are exposed when running the nvidia gpu operator. | ||
|
||
!!!note | ||
In order to make use of this dashboard, you will need to have a GPU backed EKS cluster and deploy the [GPU operator](https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/amazon-eks.html) | ||
The recommended way of deploying the GPU operator is the [Data on EKS Blueprint](https://github.com/aws-ia/terraform-aws-eks-data-addons/blob/main/nvidia-gpu-operator.tf) | ||
|
||
## Deployment | ||
|
||
This is enabled by default in the [base infrasturcture module](https://aws-observability.github.io/terraform-aws-observability-accelerator/eks/). | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters