The AMD Inference Server is an open-source tool to deploy your machine learning models and make them accessible to clients for inference. Out-of-the-box, the server can support selected models that run on AMD CPUs, GPUs or FPGAs by leveraging existing libraries. For all these models and hardware accelerators, the server presents a common user interface based on community standards so clients can make requests to any using the same API. The server provides HTTP/REST and gRPC interfaces for clients to submit requests. For both, there are C++ and Python bindings to simplify writing client programs. You can also use the server backend directly using the native C++ API to write local applications.
- Supports client requests using HTTP/REST, gRPC and websocket protocols using an API based on KServe's v2 specification
- Custom applications can directly call the backend bypassing the other protocols using the native C++ API
- C++ library with Python bindings to simplify making requests to the server
- Incoming requests are transparently batched based on the user specifications
- Users can define how many models, and how many instances of each, to run in parallel
The AMD Inference Server is integrated with the following libraries out of the gate:
- TensorFlow and PyTorch models with ZenDNN on AMD CPUs
- ONNX models with MIGraphX on AMD GPUs
- XModel models with Vitis AI on AMD FPGAs
- A graph of computation including as pre- and post-processing can be written using AKS on AMD FPGAs for end-to-end inference
The documentation for the AMD Inference Server is available online.
Check out the Quickstart on how to get started.
Raise issues if you find a bug or need help. Refer to Contributing for more information.
The AMD Inference Server is licensed under the terms of Apache 2.0 (see LICENSE). The LICENSE file contains additional license information for third-party files distributed with this work. More license information can be seen in the dependencies.