The FIL backend is a part of Triton and can be installed via the methods described in the main Triton documentation. To quickly get up and running with a Triton Docker image, follow these steps.
Note: Looking for instructions to build the FIL backend yourself? Check out our build guide.
Triton containers are available from NGC and may be pulled down via
docker pull nvcr.io/nvidia/tritonserver:22.10-py3
Note that the FIL backend cannot be used in the 21.06
version of this
container; the 21.06.1
patch release is the earliest Triton version with a
working FIL backend implementation.
In order to actually deploy a model, you will need to provide the serialized model and configuration file in a specially-structured directory called the "model repository." Check out the configuration guide for details on how to do this for your model.
Assuming your model repository is on the host system, you can bind-mount it into the container and start the server via the following command:
docker run --gpus all -p 8000:8000 -p 8001:8001 -p 8002:8002 -v ${MODEL_REPO}:/models --name tritonserver nvcr.io/nvidia/tritonserver:22.11-py3 tritonserver --model-repository=/models
Remember that bind-mounts require an absolute path to the host
directory, so ${MODEL_REPO}
should be replaced by the absolute path to the
model repository directory on the host.
Assuming you started your container with the name "tritonserver" as in the above snippet, you can bring the server down again and remove the container with:
docker rm -f tritonserver