- Kubernetes
- NVIDIA GPU Node(s)
- NVIDIA GPU Operator
- Kube Prometheus Stack
- Docker image repo with push/pull authorization
- Docker Compose v3
Deploy up to ten models using Triton Inference Server on Kubernetes load balancing provided by Traefik and horizontal pod autoscaling based on metrics from Triton Metrics Server.
Name | Platform | Inputs | Outputs | Batch |
---|---|---|---|---|
densenet-onnx | onnxruntime_onnx | 1 | 1 | 1 |
inception-graphdef | tensorflow_graphdef | 1 | 1 | 128 |
log-parsing-onnx | onnxruntime_onnx | 2 | 1 | 32 |
phishing-bert-onnx | onnxruntime_onnx | 2 | 1 | 32 |
sid-minibert-onnx | onnxruntime_onnx | 2 | 1 | 32 |
simple | tensorflow_graphdef | 2 | 2 | 8 |
simple-dyna-sequence | tensorflow_graphdef | 1 | 1 | 1 |
simple-identity | tensorflow_savedmodel | 1 | 1 | 8 |
simple-int8 | tensorflow_graphdef | 2 | 2 | 8 |
simple-sequence | tensorflow_graphdef | 1 | 1 | 1 |
With the exception of inception-graphdef
and densenet-onnx
, models are stored using git LFS. To download these you must have git LFS installed.
-
Clone this repository.
git clone https://github.com/tuttlebr/triton-server-demo.git
cd triton-server-demo
-
Update the model repo with models stored on git LFS.
git lfs pull
-
Build the required Docker images and download the
inception-graphdef
anddensenet-onnx
models. This will also attempt to push the containers to a repository accessible by your Kubernetes nodes and should be updated as needed. This will also deploy the Helm chart.bash 00-start.sh
-
This will query the Triton Server. It may take a while for all pods to come online and for Triton to start all models. You can use this to check the status.
bash 01-querytriton.sh
-
Run the NVIDIA
perf_analyzer
from within a Triton Client pod.bash 02-perfanalyzer.sh
-
You can monitor the Triton Server by uploading the
triton-server-dashboard.json
file to your Grafana dashboard.