Homework

Solution:

Code

Video

In this homework, we will use Credit Card Data from the previous homework.

Note: sometimes your answer doesn't match one of the options exactly. That's fine. Select the option that's closest to your solution.

Question 1

Install Pipenv
What's the version of pipenv you installed?
Use --version to find out

Question 2

Use Pipenv to install Scikit-Learn version 1.0.2
What's the first hash for scikit-learn you get in Pipfile.lock?

Note: you should create an empty folder for homework and do it there.

Models

We've prepared a dictionary vectorizer and a model.

They were trained (roughly) using this code:

features = ['reports', 'share', 'expenditure', 'owner']
dicts = df[features].to_dict(orient='records')

dv = DictVectorizer(sparse=False)
X = dv.fit_transform(dicts)

model = LogisticRegression(solver='liblinear').fit(X, y)

Note: You don't need to train the model. This code is just for your reference.

And then saved with Pickle. Download them:

DictVectorizer
LogisticRegression

With wget:

PREFIX=https://raw.githubusercontent.com/alexeygrigorev/mlbookcamp-code/master/course-zoomcamp/cohorts/2022/05-deployment/homework
wget $PREFIX/model1.bin
wget $PREFIX/dv.bin

Question 3

Let's use these models!

Write a script for loading these models with pickle
Score this client:

{"reports": 0, "share": 0.001694, "expenditure": 0.12, "owner": "yes"}

What's the probability that this client will get a credit card?

0.162
0.391
0.601
0.993

If you're getting errors when unpickling the files, check their checksum:

$ md5sum model1.bin dv.bin
3f57f3ebfdf57a9e1368dcd0f28a4a14  model1.bin
6b7cded86a52af7e81859647fa3a5c2e  dv.bin

Question 4

Now let's serve this model as a web service

Install Flask and gunicorn (or waitress, if you're on Windows)
Write Flask code for serving the model
Now score this client using requests:

url = "YOUR_URL"
client = {"reports": 0, "share": 0.245, "expenditure": 3.438, "owner": "yes"}
requests.post(url, json=client).json()

What's the probability that this client will get a credit card?

0.274
0.484
0.698
0.928

Docker

Install Docker. We will use it for the next two questions.

For these questions, we prepared a base image: svizor/zoomcamp-model:3.9.12-slim. You'll need to use it (see Question 5 for an example).

This image is based on python:3.9.12-slim and has a logistic regression model (a different one) as well a dictionary vectorizer inside.

This is how the Dockerfile for this image looks like:

FROM python:3.9.12-slim
WORKDIR /app
COPY ["model2.bin", "dv.bin", "./"]

We already built it and then pushed it to svizor/zoomcamp-model:3.9.12-slim.

Note: You don't need to build this docker image, it's just for your reference.

Question 5

Download the base image svizor/zoomcamp-model:3.9.12-slim. You can easily make it by using docker pull command.

So what's the size of this base image?

15 Mb
125 Mb
275 Mb
415 Mb

You can get this information when running docker images - it'll be in the "SIZE" column.

Dockerfile

Now create your own Dockerfile based on the image we prepared.

It should start like that:

FROM svizor/zoomcamp-model:3.9.12-slim
# add your stuff here

Now complete it:

Install all the dependencies form the Pipenv file
Copy your Flask script
Run it with Gunicorn

After that, you can build your docker image.

Question 6

Let's run your docker container!

After running it, score this client once again:

url = "YOUR_URL"
client = {"reports": 0, "share": 0.245, "expenditure": 3.438, "owner": "yes"}
requests.post(url, json=client).json()

What's the probability that this client will get a credit card now?

0.289
0.502
0.769
0.972

Submit the results

Submit your results here: https://forms.gle/jU2we8f9WeLgX3qa6
You can submit your solution multiple times. In this case, only the last submission will be used
If your answer doesn't match options exactly, select the closest one

Deadline

The deadline for submitting is 10 October 2022 (Monday), 23:00 CEST (Berlin time).

After that, the form will be closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

homework.md

homework.md

Homework

Question 1

Question 2

Models

Question 3

Question 4

Docker

Question 5

Dockerfile

Question 6

Submit the results

Deadline

Files

homework.md

Latest commit

History

homework.md

File metadata and controls

Homework

Question 1

Question 2

Models

Question 3

Question 4

Docker

Question 5

Dockerfile

Question 6

Submit the results

Deadline