This repository walks through an example of what an end-to-end MLOps pipeline could look like. It uses all open source tools:
- Pachyderm to manage, version and transform data
- Determined to train a model and manage model versions
- Seldon to deploy models and request predictions
Actually, we will consider Pachyderm Enterprise and Seldon Deploy as there is some additional complexity that we want to cover and because these are the products normally found in production.
The overall integration will rely on the following Google Cloud infrastructure:
- All Pachyderm, Determined and Seldon components will be deployed on a GKE cluster
- Pachyderm will use a bucket to store the repositories
- Determined will use a bucket to store the models' checkpoints
- Seldon will use a bucket to store data for the model drift and outlier detectors
- Google Cloud Registry will be used to store the container images for the two Pachyderm pipelines and the Seldon serving image
In order to keep the explanation simple, let's break the integration description into a serie of steps: