- Recap of what we've done so far: Design -> Train
- We'll be adding another element in this flow: Design -> Train -> Operate
graph LR
A[DESIGN] --> B[TRAIN]
B --> C[OPERATE]
B --> |experiment tracking| B
B --> |Training Pipeline| D((model.bin))
D --> |Deployment| C
- There are primarily two kinds of deployments:
- Batch (offline) - runs regularly
- Online - Up & running all the time with two sub-options:
- Web service
- Streaming
- A batch mode "scores" data based on a pre-trained model on a regular interval (e.g. hourly, daily, monthly)
- This is used for use cases where the activities it supports are not happening in real time but can be batched
- For example, if I'm sending users promotional offers based on their likelihood to churn from my service, I can easily do this daily
- The duration prediction we've been exploring is a perfect use case for a prediction that can be returned by making a call to a "Ride Duration Prediction Service"
- This relationship produces a 1:1 relationship between the client
(BackendService)
and the server(DurationPredictionService)
which is kept alive for only as long as it takes to process the request- This is an important distinction as it differentiates vs the streaming approach below
- Here's an example of how this could look from a sequence point of view:
sequenceDiagram
participant User
participant MobileApp
participant BackendService
participant DurationPredictionService
User->>+MobileApp: Get ride duration and cost
MobileApp->>+BackendService: {...}
BackendService->>+DurationPredictionService: {user_payload}
DurationPredictionService-->>-BackendService: Returns {duration: 30s, cost: $25}
BackendService-->>-MobileApp: Returns {duration: 30s, cost: $25}
MobileApp-->>-User: Shows result
- In the streaming use case, the concept builds on web service by decoupling the client from the server and establishing a many:many relationship between
Producers
andConsumers
- Here's an example of what that flow could look like:
graph LR
User[User] --> MobileApp[MobileApp]
MobileApp --> ProducerEventStream[Producer Event Stream]
ProducerEventStream --> Consumer1{NSFW Service}
ProducerEventStream --> Consumer2{Copyright Service}
ProducerEventStream --> Consumer3{Quality Service}
Consumer1 --> ConsumerEventStream[Consumer Event Stream]
Consumer2 --> ConsumerEventStream
Consumer3 --> ConsumerEventStream
ConsumerEventStream --> ModerationDecision{Moderation Decision}
ModerationDecision -- "Flag Content for Removal" --> MobileApp[MobileApp]
- In this flow, the equivalent of the
BackendService
is theProducer Event Stream
- It doesn't establish a connection with the ```Consumers`` or really care what they do with the data its publishing
- That role is fulfilled downstream by a
Consumer Event Stream
which feeds its data to a Moderation Decision that decides whether the content should be flagged or not
- One suggestion that isn't covered in the course is to setup your directory as a python package
- This would enforce some amount of organization and also allow easier access to python functions throughout your work
- Here's a suggested structure for organizing this work:
|── model_app/
|── __init__.py
├──── ├── src/
│ ├── __init__.py
│ └── predict.py
├──── ├── tests/
│ ├── __init__.py
│ └── test.py
├──── ├── models/
│ └── model1.bin
│ └── model2.bin
- Using this structure and by including
__init__.py
throughout the package structure indicates each folder should be treated as a package - Anything in
__init__.py
will be executed when a package is imported
- To deploy a model as a web service, we'll need to do these high-level steps:
- Create a virtual environment
- Create the script for doing the predictions
- Put the script into a Flask App
- Package the app to docker
- We'll go through these step by step:
- Create a virtual environment:
- There are many ways to do this. Pipenv is a very common way to do this (an alternative would be something like poetry)
- The way it works is like this:
- You install package using
pipenv install <package>
- pipenv then adds the package to your Pipfile (a human readable file capturing all the project's depedencies) and creates Pipfile.lock (which contains machine readble content about the dependencies and versions)
- You install package using
- Create the script for doing the predictions
- This would be your
src/predict.py
file - You can accompany it with a
tests/test.py
file to call it in order to check it for some candidate inputs
- This would be your
- Put the script into a Flask app
- The main parts here are
- An important call out here is to use a proper WSGI server:
gunicorn
to host your prediction service - You do this by doing a
pipenv install gunicorn
- And then:
gunicorn --bind=localhost:9696 predict:app
- app is the Flask object instantiated in the predict.py script
- Package the app to docker
- To do this, you'll need to make sure to have a few things:
- Your pipenv artifacts: Pipfile and Pipfile.lock
- The predict.py where your application code lives and the model binary
- With that you can create a Dockerfile specifying all these various artifacts and to launch the
gunicorn
web server - Here's the Dockerfile
FROM python:3.9.7-slim RUN pip install -U pip RUN pip install pipenv WORKDIR /app COPY [ "Pipfile", "Pipfile.lock", "./" ] RUN pipenv install --system --deploy COPY [ "predict.py", "model.bin", "./" ] EXPOSE 9696 ENTRYPOINT [ "gunicorn", "--bind=0.0.0.0:9696", "predict:app" ]
- Some important call outs:
- pipenv with --system --deploy skips creating a virtual environment and just install those packages in base python
- you need to pass 9696 so when a container is deployed, you can access the endpoint
- the entrypoint is the command to run the gunicorn server with the predict:app flask application
- Now we can build the docker image with:
docker build -t name_of_prediction_service:v1 .
- And then we can run the docker container with that image:
docker run -it --rm -p 9696:9696 name_of_prediction_service:v1
- To do this, you'll need to make sure to have a few things: